About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

28 Feb 2020 at 01:41
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 28 Feb 2020 at 01:41 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

RevDate: 2020-02-27

Kim HT (2020)

Vocal Feminization for Transgender Women: Current Strategies and Patient Perspectives.

International journal of general medicine, 13:43-52 pii:205102.

Voice feminization for transgender women is a highly complicated comprehensive transition process. Voice feminization has been thought to be equal to pitch elevation. Thus, many surgical procedures have only focused on pitch raising for voice feminization. However, voice feminization should not only consider voice pitch but also consider gender differences in physical, neurophysiological, and acoustical characteristics of voice. That is why voice therapy has been the preferred choice for the feminization of the voice. Considering gender difference of phonatory system, the method for voice feminization consists of changing the following four critical elements: fundamental frequency, resonance frequency related to vocal tract volume and length, formant tuning, and phonatory pattern. Voice feminizing process can be generally divided into non-surgical feminization and surgical feminization. As a non-surgical procedure, feminization voice therapy consists of increasing fundamental frequency, improving oral and pharyngeal resonance, and behavioral therapy. Surgical feminization usually can be achieved by external approach or endoscopic approach. Based on three factors (length, tension and mass) of vocal fold for pitch modulation, surgical procedure can be classified as one-factor, two-factors and three-factors modification of vocal folds. Recent systematic reviews and meta-analysis studies have reported positive outcomes for both the voice therapy and voice feminization surgery. The benefits of voice therapy, as it is highly satisfactory, mostly increase vocal pitch, and are noninvasive. However, the surgical voice feminization of three-factors modification of vocal folds is also highly competent and provides a maximum absolute increase in vocal pitch. Voice feminization is a long transition journey for physical, neurophysiological, and psychosomatic changes that convert a male phonatory system to a female phonatory system. Therefore, strategies for voice feminization should be individualized according to the individual's physical condition, the desired change in voice pitch, economic conditions, and social roles.

RevDate: 2020-02-20

Levy ES, Moya-Galé G, Chang YM, et al (2020)

Effects of speech cues in French-speaking children with dysarthria.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Articulatory excursion and vocal intensity are reduced in many children with dysarthria due to cerebral palsy (CP), contributing to the children's intelligibility deficits and negatively affecting their social participation. However, the effects of speech-treatment strategies for improving intelligibility in this population are understudied, especially for children who speak languages other than English. In a cueing study on English-speaking children with dysarthria, acoustic variables and intelligibility improved when the children were provided with cues aimed to increase articulatory excursion and vocal intensity. While French is among the top 20 most spoken languages in the world, dysarthria and its management in French-speaking children are virtually unexplored areas of research. Information gleaned from such research is critical for providing an evidence base on which to provide treatment.

AIMS: To examine acoustic and perceptual changes in the speech of French-speaking children with dysarthria, who are provided with speech cues targeting greater articulatory excursion (French translation of 'speak with your big mouth') and vocal intensity (French translation of 'speak with your strong voice'). This study investigated whether, in response to the cues, the children would make acoustic changes and listeners would perceive the children's speech as more intelligible.

METHODS & PROCEDURES: Eleven children with dysarthria due to CP (six girls, five boys; ages 4;11-17;0 years; eight with spastic CP, three with dyskinetic CP) repeated pre-recorded speech stimuli across three speaking conditions (habitual, 'big mouth' and 'strong voice'). Stimuli were sentences and contrastive words in phrases. Acoustic analyses were conducted. A total of 66 Belgian-French listeners transcribed the children's utterances orthographically and rated their ease of understanding on a visual analogue scale at sentence and word levels.

OUTCOMES & RESULTS: Acoustic analyses revealed significantly longer duration in response to the big mouth cue at sentence level and in response to both the big mouth and strong voice cues at word level. Significantly higher vocal sound-pressure levels were found following both cues at sentence and word levels. Both cues elicited significantly higher first-formant vowel frequencies and listeners' greater ease-of-understanding ratings at word level. Increases in the percentage of words transcribed correctly and in sentence ease-of-understanding ratings, however, did not reach statistical significance. Considerable variability between children was observed.

Speech cues targeting greater articulatory excursion and vocal intensity yield significant acoustic changes in French-speaking children with dysarthria. However, the changes may only aid listeners' ease of understanding at word level. The significant findings and great inter-speaker variability are generally consistent with studies on English-speaking children with dysarthria, although changes appear more constrained in these French-speaking children. What this paper adds What is already known on the subject According to the only study comparing effects of speech-cueing strategies on English-speaking children with dysarthria, intelligibility increases when the children are provided with cues aimed to increase articulatory excursion and vocal intensity. Little is known about speech characteristics in French-speaking children with dysarthria and no published research has explored effects of cueing strategies in this population. What this paper adds to existing knowledge This paper is the first study to examine the effects of speech cues on the acoustics and intelligibility of French-speaking children with CP. It provides evidence that the children can make use of cues to modify their speech, although the changes may only aid listeners' ease of understanding at word level. What are the potential or actual clinical implications of this work? For clinicians, the findings suggest that speech cues emphasizing increasing articulatory excursion and vocal intensity show promise for improving the ease of understanding of words produced by francophone children with dysarthria, although improvements may be modest. The variability in the responses also suggests that this population may benefit from a combination of such cues to produce words that are easier to understand.

RevDate: 2020-02-20

Boë LJ, Sawallis TR, Fagot J, et al (2019)

Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science.

Science advances, 5(12):eaaw3916 pii:aaw3916.

Recent articles on primate articulatory abilities are revolutionary regarding speech emergence, a crucial aspect of language evolution, by revealing a human-like system of proto-vowels in nonhuman primates and implicitly throughout our hominid ancestry. This article presents both a schematic history and the state of the art in primate vocalization research and its importance for speech emergence. Recent speech research advances allow more incisive comparison of phylogeny and ontogeny and also an illuminating reinterpretation of vintage primate vocalization data. This review produces three major findings. First, even among primates, laryngeal descent is not uniquely human. Second, laryngeal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns. Thus, evidence now overwhelmingly refutes the long-standing laryngeal descent theory, which pushes back "the dawn of speech" beyond ~200 ka ago to over ~20 Ma ago, a difference of two orders of magnitude.

RevDate: 2020-02-10

Kearney E, Nieto-Castañón A, Weerathunge HR, et al (2019)

A Simple 3-Parameter Model for Examining Adaptation in Speech and Voice Production.

Frontiers in psychology, 10:2995.

Sensorimotor adaptation experiments are commonly used to examine motor learning behavior and to uncover information about the underlying control mechanisms of many motor behaviors, including speech production. In the speech and voice domains, aspects of the acoustic signal are shifted/perturbed over time via auditory feedback manipulations. In response, speakers alter their production in the opposite direction of the shift so that their perceived production is closer to what they intended. This process relies on a combination of feedback and feedforward control mechanisms that are difficult to disentangle. The current study describes and tests a simple 3-parameter mathematical model that quantifies the relative contribution of feedback and feedforward control mechanisms to sensorimotor adaptation. The model is a simplified version of the DIVA model, an adaptive neural network model of speech motor control. The three fitting parameters of SimpleDIVA are associated with the three key subsystems involved in speech motor control, namely auditory feedback control, somatosensory feedback control, and feedforward control. The model is tested through computer simulations that identify optimal model fits to six existing sensorimotor adaptation datasets. We show its utility in (1) interpreting the results of adaptation experiments involving the first and second formant frequencies as well as fundamental frequency; (2) assessing the effects of masking noise in adaptation paradigms; (3) fitting more than one perturbation dimension simultaneously; (4) examining sensorimotor adaptation at different timepoints in the production signal; and (5) quantitatively predicting responses in one experiment using parameters derived from another experiment. The model simulations produce excellent fits to real data across different types of perturbations and experimental paradigms (mean correlation between data and model fits across all six studies = 0.95 ± 0.02). The model parameters provide a mechanistic explanation for the behavioral responses to the adaptation paradigm that are not readily available from the behavioral responses alone. Overall, SimpleDIVA offers new insights into speech and voice motor control and has the potential to inform future directions of speech rehabilitation research in disordered populations. Simulation software, including an easy-to-use graphical user interface, is publicly available to facilitate the use of the model in future studies.

RevDate: 2020-02-09

Viegas F, Viegas D, Serra Guimarães G, et al (2020)

Acoustic Analysis of Voice and Speech in Men with Skeletal Class III Malocclusion: A Pilot Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000505186 [Epub ahead of print].

OBJECTIVES: To assess the fundamental (f0) and first third formant (F1, F2, F3) frequencies of the 7 oral vowels of Brazilian Portuguese in men with skeletal class III malocclusion and to compare these measures with a control group of individuals with Angle's class I.

METHODS: Sixty men aged 18-40 years, 20 with Angle's class III skeletal malocclusion and 40 with Angle's class I malocclusion were selected by speech therapists and dentists. The speech signals were obtained from sustained vowels, and the values of f0 and frequencies of F1, F2 and F3 were estimated. The differences were verified through Student's t test, and the effect size calculation was performed.

RESULTS: In the class III group, more acute f0 values were observed in all vowels, higher values of F1 in the vowels [a] and [ε] and in F2 in the vowels [a], [e] and [i] and lower F1 and F3 values of the vowel [u].

CONCLUSION: More acute f0 values were found in all vowels investigated in the class III group, which showed a higher laryngeal position in the production of these sounds. The frequencies of the first 3 formants showed punctual differences, with higher values of F1 in the vowels [a] and [ε] and of F2 in [a], [e] and [i], and lower values of F1 and F3 in the vowel [u] in the experimental group. Thus, it is concluded that the fundamental frequency of the voice was the main parameter that differentiated the studied group from the control.

RevDate: 2020-02-02

Kelley MC, BV Tucker (2020)

A comparison of four vowel overlap measures.

The Journal of the Acoustical Society of America, 147(1):137.

Multiple measures of vowel overlap have been proposed that use F1, F2, and duration to calculate the degree of overlap between vowel categories. The present study assesses four of these measures: the spectral overlap assessment metric [SOAM; Wassink (2006). J. Acoust. Soc. Am. 119(4), 2334-2350], the a posteriori probability (APP)-based metric [Morrison (2008). J. Acoust. Soc. Am. 123(1), 37-40], the vowel overlap analysis with convex hulls method [VOACH; Haynes and Taylor, (2014). J. Acoust. Soc. Am. 136(2), 883-891], and the Pillai score as first used for vowel overlap by Hay, Warren, and Drager [(2006). J. Phonetics 34(4), 458-484]. Summaries of the measures are presented, and theoretical critiques of them are performed, concluding that the APP-based metric and Pillai score are theoretically preferable to SOAM and VOACH. The measures are empirically assessed using accuracy and precision criteria with Monte Carlo simulations. The Pillai score demonstrates the best overall performance in these tests. The potential applications of vowel overlap measures to research scenarios are discussed, including comparisons of vowel productions between different social groups, as well as acoustic investigations into vowel formant trajectories.

RevDate: 2020-02-02

Renwick MEL, JA Stanley (2020)

Modeling dynamic trajectories of front vowels in the American South.

The Journal of the Acoustical Society of America, 147(1):579.

Regional variation in American English speech is often described in terms of shifts, indicating which vowel sounds are converging or diverging. In the U.S. South, the Southern vowel shift (SVS) and African American vowel shift (AAVS) affect not only vowels' relative positions but also their formant dynamics. Static characterizations of shifting, with a single pair of first and second formant values taken near vowels' midpoint, fail to capture this vowel-inherent spectral change, which can indicate dialect-specific diphthongization or monophthongization. Vowel-inherent spectral change is directly modeled to investigate how trajectories of front vowels /i eɪ ɪ ɛ/ differ across social groups in the 64-speaker Digital Archive of Southern Speech. Generalized additive mixed models are used to test the effects of two social factors, sex and ethnicity, on trajectory shape. All vowels studied show significant differences between men, women, African American and European American speakers. Results show strong overlap between the trajectories of /eɪ, ɛ/ particularly among European American women, consistent with the SVS, and greater vowel-inherent raising of /ɪ/ among African American speakers, indicating how that lax vowel is affected by the AAVS. Model predictions of duration additionally indicate that across groups, trajectories become more peripheral as vowel duration increases.

RevDate: 2020-02-02

Chung H (2020)

Vowel acoustic characteristics of Southern American English variation in Louisiana.

The Journal of the Acoustical Society of America, 147(1):541.

This study examined acoustic characteristics of vowels produced by speakers from Louisiana, one of the states in the Southern English dialect region. First, how Louisiana vowels differ from or are similar to the reported patterns of Southern dialect were examined. Then, within-dialect differences across regions in Louisiana were examined. Thirty-four female adult monolingual speakers of American English from Louisiana, ranging in age from 18 to 23, produced English monosyllabic words containing 11 vowels /i, ɪ, e, ɛ, æ, ʌ, u, ʊ, o, ɔ, ɑ/. The first two formant frequencies at the midpoint of the vowel nucleus, direction, and amount of formant changes across three different time points (20, 50, and 80%), and vowel duration were compared to previously reported data on Southern vowels. Overall, Louisiana vowels showed patterns consistent with previously reported characteristics of Southern vowels that reflect ongoing changes in the Southern dialect (no evidence of acoustic reversal of tense-lax pairs, more specifically no peripheralization of front vowels). Some dialect-specific patterns were also observed (a relatively lesser degree of formant changes and slightly shorter vowel duration). These patterns were consistent across different regions within Louisiana.

RevDate: 2020-01-20

Maebayashi H, Takiguchi T, S Takada (2019)

Study on the Language Formation Process of Very-Low-Birth-Weight Infants in Infancy Using a Formant Analysis.

The Kobe journal of medical sciences, 65(2):E59-E70.

Expressive language development depends on anatomical factors, such as motor control of the tongue and oral cavity needed for vocalization, as well as cognitive aspects for comprehension and speech. The purpose of this study was to examine the differences in expressive language development between normal-birth-weight (NBW) infants and very-low-birth-weight (VLBW) infants in infancy using a formant analysis. We also examined the presence of differences between infants with a normal development and those with a high risk of autism spectrum disorder who were expected to exist among VLBW infants. The participants were 10 NBW infants and 10 VLBW infants 12-15 months of age whose speech had been recorded at intervals of approximately once every 3 months. The recorded speech signal was analyzed using a formant analysis, and changes due to age were observed. One NBW and 3 VLBW infants failed to pass the screening tests (CBCL and M-CHAT) at 24 months of age. The formant frequencies (F1 and F2) of the three groups of infants (NBW, VLBW and CBCL·M-CHAT non-passing infants) were scatter-plotted by age. For the NBW and VLBW infants, the area of the plot increased with age, but there was no significant expansion of the plot area for the CBCL·M-CHAT non-passing infants. The results showed no significant differences in expressive language development between NBW infants at 24 months old and VLBW infants at the corrected age. However, different language developmental patterns were observed in CBCL·M-CHAT non-passing infants, regardless of birth weight, suggesting the importance of screening by acoustic analyses.

RevDate: 2020-01-16

Hosbach-Cannon CJ, Lowell SY, Colton RH, et al (2020)

Assessment of Tongue Position and Laryngeal Height in Two Professional Voice Populations.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose To advance our current knowledge of singer physiology by using ultrasonography in combination with acoustic measures to compare physiological differences between musical theater (MT) and opera (OP) singers under controlled phonation conditions. Primary objectives addressed in this study were (a) to determine if differences in hyolaryngeal and vocal fold contact dynamics occur between two professional voice populations (MT and OP) during singing tasks and (b) to determine if differences occur between MT and OP singers in oral configuration and associated acoustic resonance during singing tasks. Method Twenty-one singers (10 MT and 11 OP) were included. All participants were currently enrolled in a music program. Experimental procedures consisted of sustained phonation on the vowels /i/ and /ɑ/ during both a low-pitch task and a high-pitch task. Measures of hyolaryngeal elevation, tongue height, and tongue advancement were assessed using ultrasonography. Vocal fold contact dynamics were measured using electroglottography. Simultaneous acoustic recordings were obtained during all ultrasonography procedures for analysis of the first two formant frequencies. Results Significant oral configuration differences, reflected by measures of tongue height and tongue advancement, were seen between groups. Measures of acoustic resonance also showed significant differences between groups during specific tasks. Both singer groups significantly raised their hyoid position when singing high-pitched vowels, but hyoid elevation was not statistically different between groups. Likewise, vocal fold contact dynamics did not significantly differentiate the two singer groups. Conclusions These findings suggest that, under controlled phonation conditions, MT singers alter their oral configuration and achieve differing resultant formants as compared with OP singers. Because singers are at a high risk of developing a voice disorder, understanding how these two groups of singers adjust their vocal tract configuration during their specific singing genre may help to identify risky vocal behavior and provide a basis for prevention of voice disorders.

RevDate: 2020-01-15

Seyfarth RM, Cheney DL, Harcourt AH, et al (1994)

The acoustic features of gorilla double grunts and their relation to behavior.

American journal of primatology, 33(1):31-50.

Mountain gorillas (Gorilla gorilla beringei) give double-grunts to one another in a variety of situations, when feeding, resting, moving, or engaged in other kinds of social behavior. Some double-grunts elicit double-grunts in reply whereas others do not. Double-grunts are individually distinctive, and high-ranking animals give double-grunts at higher rates than others. There was no evidence, however, that the probability of eliciting a reply depended upon either the animals' behavior at the time a call was given or the social relationship between caller and respondent. The probability of eliciting a reply could be predicted from a double-grunt's acoustic features. Gorillas apparently produce at least two acoustically different subtypes of double-grunts, each of which conveys different information. Double-grunts with a low second formant (typically < 1600 Hz) are given by animals after a period of silence and frequently elicit vocal replies. Double-grunts with a high second formant (typically > 1600 Hz) are given by animals within 5 s of a call from another individual and rarely elicit replies. © 1994 Wiley-Liss, Inc.

RevDate: 2020-01-15

Souza P, Gallun F, R Wright (2020)

Contributions to Speech-Cue Weighting in Older Adults With Impaired Hearing.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose In a previous paper (Souza, Wright, Blackburn, Tatman, & Gallun, 2015), we explored the extent to which individuals with sensorineural hearing loss used different cues for speech identification when multiple cues were available. Specifically, some listeners placed the greatest weight on spectral cues (spectral shape and/or formant transition), whereas others relied on the temporal envelope. In the current study, we aimed to determine whether listeners who relied on temporal envelope did so because they were unable to discriminate the formant information at a level sufficient to use it for identification and the extent to which a brief discrimination test could predict cue weighting patterns. Method Participants were 30 older adults with bilateral sensorineural hearing loss. The first task was to label synthetic speech tokens based on the combined percept of temporal envelope rise time and formant transitions. An individual profile was derived from linear discriminant analysis of the identification responses. The second task was to discriminate differences in either temporal envelope rise time or formant transitions. The third task was to discriminate spectrotemporal modulation in a nonspeech stimulus. Results All listeners were able to discriminate temporal envelope rise time at levels sufficient for the identification task. There was wide variability in the ability to discriminate formant transitions, and that ability predicted approximately one third of the variance in the identification task. There was no relationship between performance in the identification task and either amount of hearing loss or ability to discriminate nonspeech spectrotemporal modulation. Conclusions The data suggest that listeners who rely to a greater extent on temporal cues lack the ability to discriminate fine-grained spectral information. The fact that the amount of hearing loss was not associated with the cue profile underscores the need to characterize individual abilities in a more nuanced way than can be captured by the pure-tone audiogram.

RevDate: 2020-01-17

Kamiloğlu RG, Fischer AH, DA Sauter (2020)

Good vibrations: A review of vocal expressions of positive emotions.

Psychonomic bulletin & review pii:10.3758/s13423-019-01701-x [Epub ahead of print].

Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.

RevDate: 2020-01-08

Dubey AK, Prasanna SRM, S Dandapat (2019)

Detection and assessment of hypernasality in repaired cleft palate speech using vocal tract and residual features.

The Journal of the Acoustical Society of America, 146(6):4211.

The presence of hypernasality in repaired cleft palate (CP) speech is a consequence of velopharyngeal insufficiency. The coupling of the nasal tract with the oral tract adds nasal formant and antiformant pairs in the hypernasal speech spectrum. This addition deviates the spectral and linear prediction (LP) residual characteristics of hypernasal speech compared to normal speech. In this work, the vocal tract constriction feature, peak to side-lobe ratio feature, and spectral moment features augmented by low-order cepstral coefficients are used to capture the spectral and residual deviations for hypernasality detection. The first feature captures the lower-frequencies prominence in speech due to the presence of nasal formants, the second feature captures the undesirable signal components in the residual signal due to the nasal antiformants, and the third feature captures the information about formants and antiformants in the spectrum along with the spectral envelope. The combination of three features gives normal versus hypernasal speech detection accuracies of 87.76%, 91.13%, and 93.70% for /a/, /i/, and /u/ vowels, respectively, and hypernasality severity detection accuracies of 80.13% and 81.25% for /i/ and /u/ vowels, respectively. The speech data are collected from 30 control normal and 30 repaired CP children between the ages of 7 and 12.

RevDate: 2019-12-31

Shiraishi M, Mishima K, H Umeda (2019)

Development of an Acoustic Simulation Method during Phonation of the Japanese Vowel /a/ by the Boundary Element Method.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30445-X [Epub ahead of print].

OBJECTIVES: The purpose of the present study was to establish the method for an acoustic simulation of a vocal tract created from CT data during phonation of the Japanese vowel /a/ and to verify the validity of the simulation.

MATERIAL AND METHODS: The subjects were 15 healthy adults (8 males, 7 females). The vocal tract model was created from CT data acquired during sustained phonation of the Japanese vowel /a/. After conversion to a mesh model for analysis, a wave acoustic analysis was performed with a boundary element method. The wall and the bottom of the vocal tract model were regarded as a rigid wall and a nonrigid wall, respectively. The acoustic medium was set to 37°C, and a point sound source was set in the place corresponding to the vocal cord as a sound source. The first and second formant frequencies (F1 and F2) were calculated. For 1 of the 15 subjects, the range from the upper end of the frontal sinus to the tracheal bifurcation was scanned, and 2 models were created: model 1 included the range from the frontal sinus to the tracheal bifurcation; and model 2 included the range from the frontal sinus to the glottis and added a virtually extended trachea by 12 cm cylindrically. F1 and F2 calculated from models 1 and 2 were compared. To evaluate the validity of the present simulation, F1 and F2 calculated from the simulation were compared with those of the actual voice and the sound generated using a solid model and a whistle-type artificial larynx. To judge the validity, the vowel formant frequency discrimination threshold reported in the past was used as a criterion. Namely, the relative discrimination thresholds (%), dividing ▵F by F, where F was the formant frequency calculated from the simulation, and ▵F was the difference between F and the formant frequency of the actual voice and the sound generated using the solid model and artificial larynx, were obtained.

RESULTS: F1 and F2 calculated from models 1 and 2 were similar. Therefore, to reduce the exposure dose, the remaining 14 subjects were scanned from the upper end of the frontal sinus to the glottis, and model 2 with the trachea extended by 12 cm virtually was used for the simulation. The averages of the relative discrimination thresholds against F1 and F2 calculated from the actual voice were 5.9% and 4.6%, respectively. The averages of the relative discrimination thresholds against F1 and F2 calculated from the sound generated by using the solid model and the artificial larynx were 4.1% and 3.7%, respectively.

CONCLUSIONS: The Japanese vowel /a/ could be simulated with high validity for the vocal tract models created from the CT data during phonation of /a/ using the boundary element method.

RevDate: 2019-12-31

Huang MY, Duan RY, Q Zhao (2019)

The influence of long-term cadmium exposure on the male advertisement call of Xenopus laevis.

Environmental science and pollution research international pii:10.1007/s11356-019-07525-5 [Epub ahead of print].

Cadmium (Cd) is a non-essential environmental endocrine-disrupting compound found in water and a potential threat to aquatic habitats. Cd has been shown to have various short-term effects on aquatic animals; however, evidence for long-term effects of Cd on vocal communications in amphibians is lacking. To better understand the long-term effects of low-dose Cd on acoustic communication in amphibians, male Xenopus laevis individuals were treated with low Cd concentrations (0.1, 1, and 10 μg/L) via aqueous exposure for 24 months. At the end of the exposure, the acoustic spectrum characteristics of male advertisement calls and male movement behaviors in response to female calls were recorded. The gene and protein expressions of the androgen receptor (AR) were determined using Western blot and RT-PCR. The results showed that long-term Cd treatment affected the spectrogram and formant of the advertisement call. Compared with the control group, 10 μg/L Cd significantly decreased the first and second formant frequency, and the fundamental and main frequency, and increased the third formant frequency. One and 10-μg/L Cd treatments significantly reduced the proportion of individuals responding to female calls and prolonged the time of first movement of the male. Long-term Cd treatment induced a downregulation in the AR protein. Treatments of 0.1, 1, and 10 μg/L Cd significantly decreased the expression of AR mRNA in the brain. These findings indicate that long-term exposure of Cd has negative effects on advertisement calls in male X. laevis.

RevDate: 2020-01-13

Park EJ, Yoo SD, Kim HS, et al (2019)

Correlations between swallowing function and acoustic vowel space in stroke patients with dysarthria.

NeuroRehabilitation, 45(4):463-469.

BACKGROUND: Dysphagia and dysarthria tend to coexist in stroke patients. Dysphagia can reduce patients' quality of life, cause aspiration pneumonia and increased mortality.

OBJECTIVE: To evaluate correlations among swallowing function parameters and acoustic vowel space values in patients with stroke.

METHODS: Data from stroke patients with dysarthria and dysphagia were collected. The formant parameter representing the resonance frequency of the vocal tract as a two-dimensional coordinate point was measured for the /a/, /ae/, /i/, and /u/vowels, and the quadrilateral vowel space area (VSA) and formant centralization ratio (FCR) were measured. Swallowing function was evaluated by a videofluoroscopic swallowing study (VFSS) using the videofluoroscopic dysphagia scale (VDS) and penetration aspiration scale (PAS). Pearson's correlation and linear regression analyses were used to assess the correlation of VSA and FCR to VDS and PAS scores.

RESULTS: Thirty-one stroke patients with dysphagia and dysarthria were analyzed. VSA showed a negative correlation to VDS and PAS scores, while FCR showed a positive correlation to VDS score, but not to PAS score. VSA and FCR were significant factors for assessing dysphagia severity.

CONCLUSIONS: VSA and FCR values were correlated with swallowing function and may be helpful in predicting dysphagia severity associated with stroke.

RevDate: 2020-01-08

McCarthy KM, Skoruppa K, P Iverson (2019)

Development of neural perceptual vowel spaces during the first year of life.

Scientific reports, 9(1):19592.

This study measured infants' neural responses for spectral changes between all pairs of a set of English vowels. In contrast to previous methods that only allow for the assessment of a few phonetic contrasts, we present a new method that allows us to assess changes in spectral sensitivity across the entire vowel space and create two-dimensional perceptual maps of the infants' vowel development. Infants aged four to eleven months were played long series of concatenated vowels, and the neural response to each vowel change was assessed using the Acoustic Change Complex (ACC) from EEG recordings. The results demonstrated that the youngest infants' responses more closely reflected the acoustic differences between the vowel pairs and reflected higher weight to first-formant variation. Older infants had less acoustically driven responses that seemed a result of selective increases in sensitivity for phonetically similar vowels. The results suggest that phonetic development may involve a perceptual warping for confusable vowels rather than uniform learning, as well as an overall increasing sensitivity to higher-frequency acoustic information.

RevDate: 2019-12-18

Houle N, SV Levi (2019)

Effect of Phonation on Perception of Femininity/Masculinity in Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30302-9 [Epub ahead of print].

Many transwomen seek voice and communication therapy to support their transition from their gender assigned at birth to their gender identity. This has led to an increased need to examine the perception of gender and femininity/masculinity to develop evidence-based intervention practices. In this study, we explore the auditory perception of femininity/masculinity in normally phonated and whispered speech. Transwomen, ciswomen, and cismen were recorded producing /hVd/ words. Naïve listeners rated femininity/masculinity of a speaker's voice using a visual analog scale, rather than completing a binary gender identification task. The results revealed that listeners rated speakers more ambiguously in whispered speech than normally phonated speech. An analysis of speaker and token characteristics revealed that in the normally phonated condition listeners consistently use f0 to rate femininity/masculinity. In addition, some evidence was found for possible contributions of formant frequencies, particularly F2, and duration. Taken together, this provides additional evidence for the salience of f0 and F2 for voice and communication intervention among transwomen.

RevDate: 2019-12-18

Xu Y, S Prom-On (2019)

Economy of Effort or Maximum Rate of Information? Exploring Basic Principles of Articulatory Dynamics.

Frontiers in psychology, 10:2469.

Economy of effort, a popular notion in contemporary speech research, predicts that dynamic extremes such as the maximum speed of articulatory movement are avoided as much as possible and that approaching the dynamic extremes is necessary only when there is a need to enhance linguistic contrast, as in the case of stress or clear speech. Empirical data, however, do not always support these predictions. In the present study, we considered an alternative principle: maximum rate of information, which assumes that speech dynamics are ultimately driven by the pressure to transmit information as quickly and accurately as possible. For empirical data, we asked speakers of American English to produce repetitive syllable sequences such as wawawawawa as fast as possible by imitating recordings of the same sequences that had been artificially accelerated and to produce meaningful sentences containing the same syllables at normal and fast speaking rates. Analysis of formant trajectories shows that dynamic extremes in meaningful speech sometimes even exceeded those in the nonsense syllable sequences but that this happened more often in unstressed syllables than in stressed syllables. We then used a target approximation model based on a mass-spring system of varying orders to simulate the formant kinematics. The results show that the kind of formant kinematics found in the present study and in previous studies can only be generated by a dynamical system operating with maximal muscular force under strong time pressure and that the dynamics of this operation may hold the solution to the long-standing enigma of greater stiffness in unstressed than in stressed syllables. We conclude, therefore, that maximum rate of information can coherently explain both current and previous empirical data and could therefore be a fundamental principle of motor control in speech production.

RevDate: 2020-01-01
CmpDate: 2019-12-12

Root-Gutteridge H, Ratcliffe VF, Korzeniowska AT, et al (2019)

Dogs perceive and spontaneously normalize formant-related speaker and vowel differences in human speech sounds.

Biology letters, 15(12):20190555.

Domesticated animals have been shown to recognize basic phonemic information from human speech sounds and to recognize familiar speakers from their voices. However, whether animals can spontaneously identify words across unfamiliar speakers (speaker normalization) or spontaneously discriminate between unfamiliar speakers across words remains to be investigated. Here, we assessed these abilities in domestic dogs using the habituation-dishabituation paradigm. We found that while dogs habituated to the presentation of a series of different short words from the same unfamiliar speaker, they significantly dishabituated to the presentation of a novel word from a new speaker of the same gender. This suggests that dogs spontaneously categorized the initial speaker across different words. Conversely, dogs who habituated to the same short word produced by different speakers of the same gender significantly dishabituated to a novel word, suggesting that they had spontaneously categorized the word across different speakers. Our results indicate that the ability to spontaneously recognize both the same phonemes across different speakers, and cues to identity across speech utterances from unfamiliar speakers, is present in domestic dogs and thus not a uniquely human trait.

RevDate: 2020-01-08

Vorperian HK, Kent RD, Lee Y, et al (2019)

Corner vowels in males and females ages 4 to 20 years: Fundamental and F1-F4 formant frequencies.

The Journal of the Acoustical Society of America, 146(5):3255.

The purpose of this study was to determine the developmental trajectory of the four corner vowels' fundamental frequency (fo) and the first four formant frequencies (F1-F4), and to assess when speaker-sex differences emerge. Five words per vowel, two of which were produced twice, were analyzed for fo and estimates of the first four formants frequencies from 190 (97 female, 93 male) typically developing speakers ages 4-20 years old. Findings revealed developmental trajectories with decreasing values of fo and formant frequencies. Sex differences in fo emerged at age 7. The decrease of fo was larger in males than females with a marked drop during puberty. Sex differences in formant frequencies appeared at the earliest age under study and varied with vowel and formant. Generally, the higher formants (F3-F4) were sensitive to sex differences. Inter- and intra-speaker variability declined with age but had somewhat different patterns, likely reflective of maturing motor control that interacts with the changing anatomy. This study reports a source of developmental normative data on fo and the first four formants in both sexes. The different developmental patterns in the first four formants and vowel-formant interactions in sex differences likely point to anatomic factors, although speech-learning phenomena cannot be discounted.

RevDate: 2020-01-12

Gianakas SP, MB Winn (2019)

Lexical bias in word recognition by cochlear implant listeners.

The Journal of the Acoustical Society of America, 146(5):3373.

When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.

RevDate: 2020-01-08

Perrachione TK, Furbeck KT, EJ Thurston (2019)

Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices.

The Journal of the Acoustical Society of America, 146(5):3384.

The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.

RevDate: 2019-12-02

Lo JJH (2019)

Between Äh(m) and Euh(m): The Distribution and Realization of Filled Pauses in the Speech of German-French Simultaneous Bilinguals.

Language and speech [Epub ahead of print].

Filled pauses are well known for their speaker specificity, yet cross-linguistic research has also shown language-specific trends in their distribution and phonetic quality. To examine the extent to which speakers acquire filled pauses as language- or speaker-specific phenomena, this study investigates the use of filled pauses in the context of adult simultaneous bilinguals. Making use of both distributional and acoustic data, this study analyzed UH, consisting of only a vowel component, and UM, with a vowel followed by [m], in the speech of 15 female speakers who were simultaneously bilingual in French and German. Speakers were found to use UM more frequently in German than in French, but only German-dominant speakers had a preference for UM in German. Formant and durational analyses showed that while speakers maintained distinct vowel qualities in their filled pauses in different languages, filled pauses in their weaker language exhibited a shift towards those in their dominant language. These results suggest that, despite high levels of variability between speakers, there is a significant role for language in the acquisition of filled pauses in simultaneous bilingual speakers, which is further shaped by the linguistic environment they grow up in.

RevDate: 2020-01-08

Frey R, Volodin IA, Volodina EV, et al (2019)

Savannah roars: The vocal anatomy and the impressive rutting calls of male impala (Aepyceros melampus) - highlighting the acoustic correlates of a mobile larynx.

Journal of anatomy [Epub ahead of print].

A retractable larynx and adaptations of the vocal folds in the males of several polygynous ruminants serve for the production of rutting calls that acoustically announce larger than actual body size to both rival males and potential female mates. Here, such features of the vocal tract and of the sound source are documented in another species. We investigated the vocal anatomy and laryngeal mobility including its acoustical effects during the rutting vocal display of free-ranging male impala (Aepyceros melampus melampus) in Namibia. Male impala produced bouts of rutting calls (consisting of oral roars and interspersed explosive nasal snorts) in a low-stretch posture while guarding a rutting territory or harem. For the duration of the roars, male impala retracted the larynx from its high resting position to a low mid-neck position involving an extensible pharynx and a resilient connection between the hyoid apparatus and the larynx. Maximal larynx retraction was 108 mm based on estimates in video single frames. This was in good concordance with 91-mm vocal tract elongation calculated on the basis of differences in formant dispersion between roar portions produced with the larynx still ascended and those produced with maximally retracted larynx. Judged by their morphological traits, the larynx-retracting muscles of male impala are homologous to those of other larynx-retracting ruminants. In contrast, the large and massive vocal keels are evolutionary novelties arising by fusion and linear arrangement of the arytenoid cartilage and the canonical vocal fold. These bulky and histologically complex vocal keels produced a low fundamental frequency of 50 Hz. Impala is another ruminant species in which the males are capable of larynx retraction. In addition, male impala vocal folds are spectacularly specialized compared with domestic bovids, allowing the production of impressive, low-frequency roaring vocalizations as a significant part of their rutting behaviour. Our study expands knowledge on the evolutionary variation of vocal fold morphology in mammals, suggesting that the structure of the mammalian sound source is not always human-like and should be considered in acoustic analysis and modelling.

RevDate: 2019-11-23

Hu G, Determan SC, Dong Y, et al (2019)

Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise.

Journal of the Association for Research in Otolaryngology : JARO pii:10.1007/s10162-019-00737-z [Epub ahead of print].

Acoustic features of speech include various spectral and temporal cues. It is known that temporal envelope plays a critical role for speech recognition by human listeners, while automated speech recognition (ASR) heavily relies on spectral analysis. This study compared sentence-recognition scores of humans and an ASR software, Dragon, when spectral and temporal-envelope cues were manipulated in background noise. Temporal fine structure of meaningful sentences was reduced by noise or tone vocoders. Three types of background noise were introduced: a white noise, a time-reversed multi-talker noise, and a fake-formant noise. Spectral information was manipulated by changing the number of frequency channels. With a 20-dB signal-to-noise ratio (SNR) and four vocoding channels, white noise had a stronger disruptive effect than the fake-formant noise. The same observation with 22 channels was made when SNR was lowered to 0 dB. In contrast, ASR was unable to function with four vocoding channels even with a 20-dB SNR. Its performance was least affected by white noise and most affected by the fake-formant noise. Increasing the number of channels, which improved the spectral resolution, generated non-monotonic behaviors for the ASR with white noise but not with colored noise. The ASR also showed highly improved performance with tone vocoders. It is possible that fake-formant noise affected the software's performance by disrupting spectral cues, whereas white noise affected performance by compromising speech segmentation. Overall, these results suggest that human listeners and ASR utilize different listening strategies in noise.

RevDate: 2019-12-18

Hu W, Tao S, Li M, et al (2019)

Distinctiveness and Assimilation in Vowel Perception in a Second Language.

Journal of speech, language, and hearing research : JSLHR, 62(12):4534-4543.

Purpose The purpose of this study was to investigate how the distinctive establishment of 2nd language (L2) vowel categories (e.g., how distinctively an L2 vowel is established from nearby L2 vowels and from the native language counterpart in the 1st formant [F1] × 2nd formant [F2] vowel space) affected L2 vowel perception. Method Identification of 12 natural English monophthongs, and categorization and rating of synthetic English vowels /i/ and /ɪ/ in the F1 × F2 space were measured for Chinese-native (CN) and English-native (EN) listeners. CN listeners were also examined with categorization and rating of Chinese vowels in the F1 × F2 space. Results As expected, EN listeners significantly outperformed CN listeners in English vowel identification. Whereas EN listeners showed distinctive establishment of 2 English vowels, CN listeners had multiple patterns of L2 vowel establishment: both, 1, or neither established. Moreover, CN listeners' English vowel perception was significantly related to the perceptual distance between the English vowel and its Chinese counterpart, and the perceptual distance between the adjacent English vowels. Conclusions L2 vowel perception relied on listeners' capacity to distinctively establish L2 vowel categories that were distant from the nearby L2 vowels.

RevDate: 2019-12-18

Mollaei F, Shiller DM, Baum SR, et al (2019)

The Relationship Between Speech Perceptual Discrimination and Speech Production in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR, 62(12):4256-4268.

Purpose We recently demonstrated that individuals with Parkinson's disease (PD) respond differentially to specific altered auditory feedback parameters during speech production. Participants with PD respond more robustly to pitch and less robustly to formant manipulations compared to control participants. In this study, we investigated whether differences in perceptual processing may in part underlie these compensatory differences in speech production. Methods Pitch and formant feedback manipulations were presented under 2 conditions: production and listening. In the production condition, 15 participants with PD and 15 age- and gender-matched healthy control participants judged whether their own speech output was manipulated in real time. During the listening task, participants judged whether paired tokens of their previously recorded speech samples were the same or different. Results Under listening, 1st formant manipulation discrimination was significantly reduced for the PD group compared to the control group. There was a trend toward better discrimination of pitch in the PD group, but the group difference was not significant. Under the production condition, the ability of participants with PD to identify pitch manipulations was greater than that of the controls. Conclusion The findings suggest perceptual processing differences associated with acoustic parameters of fundamental frequency and 1st formant perturbations in PD. These findings extend our previous results, indicating that different patterns of compensation to pitch and 1st formant shifts may reflect a combination of sensory and motor mechanisms that are differentially influenced by basal ganglia dysfunction.

RevDate: 2019-11-27

Escudero P, M Kalashnikova (2020)

Infants use phonetic detail in speech perception and word learning when detail is easy to perceive.

Journal of experimental child psychology, 190:104714.

Infants successfully discriminate speech sound contrasts that belong to their native language's phonemic inventory in auditory-only paradigms, but they encounter difficulties in distinguishing the same contrasts in the context of word learning. These difficulties are usually attributed to the fact that infants' attention to the phonetic detail in novel words is attenuated when they must allocate additional cognitive resources demanded by word-learning tasks. The current study investigated 15-month-old infants' ability to distinguish novel words that differ by a single vowel in an auditory discrimination paradigm (Experiment 1) and a word-learning paradigm (Experiment 2). These experiments aimed to tease apart whether infants' performance is dependent solely on the specific acoustic properties of the target vowels or on the context of the task. Experiment 1 showed that infants were able to discriminate only a contrast marked by a large difference along a static dimension (the vowels' second formant), whereas they were not able to discriminate a contrast with a small phonetic distance between its vowels, due to the dynamic nature of the vowels. In Experiment 2, infants did not succeed at learning words containing the same contrast they were able to discriminate in Experiment 1. The current findings demonstrate that both the specific acoustic properties of vowels in infants' native language and the task presented continue to play a significant role in early speech perception well into the second year of life.

RevDate: 2019-12-30

Rosenthal MA (2020)

A systematic review of the voice-tagging hypothesis of speech-in-noise perception.

Neuropsychologia, 136:107256.

The voice-tagging hypothesis claims that individuals who better represent pitch information in a speaker's voice, as measured with the frequency following response (FFR), will be better at speech-in-noise perception. The hypothesis has been provided to explain how music training might improve speech-in-noise perception. This paper reviews studies that are relevant to the voice-tagging hypothesis, including studies on musicians and nonmusicians. Most studies on musicians show greater f0 amplitude compared to controls. Most studies on nonmusicians do not show group differences in f0 amplitude. Across all studies reviewed, f0 amplitude does not consistently predict accuracy in speech-in-noise perception. The evidence suggests that music training does not improve speech-in-noise perception via enhanced subcortical representation of the f0.

RevDate: 2019-11-11

Hakanpää T, Waaramaa T, AM Laukkanen (2019)

Comparing Contemporary Commercial and Classical Styles: Emotion Expression in Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30209-7 [Epub ahead of print].

OBJECTIVE: This study examines the acoustic correlates of the vocal expression of emotions in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing.

STUDY DESIGN: This is an experimental comparative study.

METHODS: Eleven female singers with a minimum of 3 years of professional-level singing training in CCM, classical, or both styles participated. They sang the vowel [ɑ:] at three pitches (A3 220Hz, E4 330Hz, and A4 440Hz) expressing anger, sadness, joy, tenderness, and a neutral voice. Vowel samples were analyzed for fundamental frequency (fo) formant frequencies (F1-F5), sound pressure level (SPL), spectral structure (alpha ratio = SPL 1500-5000 Hz-SPL 50-1500 Hz), harmonics-to-noise ratio (HNR), perturbation (jitter, shimmer), onset and offset duration, sustain time, rate and extent of fo variation in vibrato, and rate and extent of amplitude vibrato.

RESULTS: The parameters that were statistically significantly (RM-ANOVA, P ≤ 0.05) related to emotion expression in both genres were SPL, alpha ratio, F1, and HNR. Additionally, for CCM, significance was found in sustain time, jitter, shimmer, F2, and F4. When fo and SPL were set as covariates in the variance analysis, jitter, HNR, and F4 did not show pure dependence on expression. The alpha ratio, F1, F2, shimmer apq5, amplitude vibrato rate, and sustain time of vocalizations had emotion-related variation also independent of fo and SPL in the CCM style, while these parameters were related to fo and SPL in the classical style.

CONCLUSIONS: The results differed somewhat for the CCM and classical styles. The alpha ratio showed less variation in the classical style, most likely reflecting the demand for a more stable voice source quality. The alpha ratio, F1, F2, shimmer, amplitude vibrato rate, and the sustain time of the vocalizations were related to fo and SPL control in the classical style. The only common independent sound parameter indicating emotional expression for both styles was SPL. The CCM style offers more freedom for expression-related changes in voice quality.

RevDate: 2019-11-22

Weirich M, A Simpson (2019)

Effects of Gender, Parental Role, and Time on Infant- and Adult-Directed Read and Spontaneous Speech.

Journal of speech, language, and hearing research : JSLHR, 62(11):4001-4014.

Purpose The study sets out to investigate inter- and intraspeaker variation in German infant-directed speech (IDS) and considers the potential impact that the factors gender, parental involvement, and speech material (read vs. spontaneous speech) may have. In addition, we analyze data from 3 time points prior to and after the birth of the child to examine potential changes in the features of IDS and, particularly also, of adult-directed speech (ADS). Here, the gender identity of a speaker is considered as an additional factor. Method IDS and ADS data from 34 participants (15 mothers, 19 fathers) is gathered by means of a reading and a picture description task. For IDS, 2 recordings were made when the baby was approximately 6 and 9 months old, respectively. For ADS, an additional recording was made before the baby was born. Phonetic analyses comprise mean fundamental frequency (f0), variation in f0, the 1st 2 formants measured in /i: ɛ a u:/, and the vowel space size. Moreover, social and behavioral data were gathered regarding parental involvement and gender identity. Results German IDS is characterized by an increase in mean f0, a larger variation in f0, vowel- and formant-specific differences, and a larger acoustic vowel space. No effect of gender or parental involvement was found. Also, the phonetic features of IDS were found in both spontaneous and read speech. Regarding ADS, changes in vowel space size in some of the fathers and in mean f0 in mothers were found. Conclusion Phonetic features of German IDS are robust with respect to the factors gender, parental involvement, speech material (read vs. spontaneous speech), and time. Some phonetic features of ADS changed within the child's first year depending on gender and parental involvement/gender identity. Thus, further research on IDS needs to address also potential changes in ADS.

RevDate: 2020-01-08

Howson PJ, MA Redford (2019)

Liquid coarticulation in child and adult speech.

Proceedings of the ... International Congress of Phonetic Sciences. International Congress of Phonetic Sciences, 2019:3100-3104.

Although liquids are mastered late, English-speaking children are said to have fully acquired these segments by age 8. The aim of this study was to test whether liquid coarticulation was also adult-like by this age. 8-year-old productions of /əLa/ and /əLu/ sequences were compared to 5-year-old and adult productions of these sequences. SSANOVA analyses of formant frequency trajectories indicated that, while adults contrasted rhotics and laterals from the onset of the vocalic sequence, F2 trajectories for rhotics and lateral were overlapped at the onset of the /əLa/ sequence in 8-year-old productions and across the entire /əLu/ sequence. The F2 trajectories for rhotics and laterals were even more overlapped in 5-year olds' productions. Overall, the study suggests that whereas younger children have difficulty coordinating the tongue body/root gesture with the tongue tip gesture, older children still struggle with the intergestural timing associated with liquid production.

RevDate: 2019-10-29

Kim D, S Kim (2019)

Coarticulatory vowel nasalization in American English: Data of individual differences in acoustic realization of vowel nasalization as a function of prosodic prominence and boundary.

Data in brief, 27:104593 pii:104593.

This article provides acoustic measurements data for vowel nasalization which are based on speech recorded from fifteen (8 female and 7 male) native speakers of American English in a laboratory setting. Each individual speaker's production patterns for the vowel nasalization in tautosyllabic CVN and NVC words are documented in terms of three acoustic parameters: the duration of nasal consonant (N-Duration), the duration of vowel (V-Duration) and the difference between the amplitude of the first formant (A1) and the first nasal peak (P0) obtained from the vowel (A1-P0) as an indication of the degree of vowel nasalization. The A1-P0 is measured at three different time points within the vowel -i.e., the near point (25%), midpoint (50%), and distant point (75%), either from the onset (CVN) or the offset (NVC) of the nasal consonant. These measures are taken from the target words in various prosodic prominence and boundary contexts: phonologically focused (PhonFOC) vs. lexically focused (LexFOC) vs. unfocused (NoFOC) conditions; phrase-edge (i.e., phrase-final for CVN and phrase-initial for NVC) vs. phrase-medial conditions. The data also contain a CSV file with each speaker's mean values of the N-Duration, V-Duration, and A1-P0 (z-scored) for each prosodic context along with the information about the speakers' gender. For further discussion of the data, please refer to the full-length article entitled "Prosodically-conditioned fine-tuning of coarticulatory vowel nasalization in English"(Cho et al., 2017).

RevDate: 2019-10-29

Goswami U, Nirmala SR, Vikram CM, et al (2019)

Analysis of Articulation Errors in Dysarthric Speech.

Journal of psycholinguistic research pii:10.1007/s10936-019-09676-5 [Epub ahead of print].

Imprecise articulation is the major issue reported in various types of dysarthria. Detection of articulation errors can help in diagnosis. The cues derived from both the burst and the formant transitions contribute to the discrimination of place of articulation of stops. It is believed that any acoustic deviations in stops due to articulation error can be analyzed by deriving features around the burst and the voicing onsets. The derived features can be used to discriminate the normal and dysarthric speech. In this work, a method is proposed to differentiate the voiceless stops produced by the normal speakers from the dysarthric by deriving the spectral moments, two-dimensional discrete cosine transform of linear prediction spectrum and Mel frequency cepstral coefficients features. These features and cosine distance based classifier is used for the classification of normal and dysarthic speech.

RevDate: 2020-01-02

Cartei V, Banerjee R, Garnham A, et al (2019)

Physiological and perceptual correlates of masculinity in children's voices.

Hormones and behavior, 117:104616 pii:S0018-506X(19)30277-6 [Epub ahead of print].

Low frequency components (i.e. a low pitch (F0) and low formant spacing (ΔF)) signal high salivary testosterone and height in adult male voices and are associated with high masculinity attributions by unfamiliar listeners (in both men and women). However, the relation between the physiological, acoustic and perceptual dimensions of speakers' masculinity prior to puberty remains unknown. In this study, 110 pre-pubertal children (58 girls), aged 3 to 10, were recorded as they described a cartoon picture. 315 adults (182 women) rated children's perceived masculinity from the voice only after listening to the speakers' audio recordings. On the basis of their voices alone, boys who had higher salivary testosterone levels were rated as more masculine and the relation between testosterone and perceived masculinity was partially mediated by F0. The voices of taller boys were also rated as more masculine, but the relation between height and perceived masculinity was not mediated by the considered acoustic parameters, indicating that acoustic cues other than F0 and ΔF may signal stature. Both boys and girls who had lower F0, were also rated as more masculine, while ΔF did not affect ratings. These findings highlight the interdependence of physiological, acoustic and perceptual dimensions, and suggest that inter-individual variation in male voices, particularly F0, may advertise hormonal masculinity from a very early age.

RevDate: 2019-10-17

Scheerer NE, Jacobson DS, JA Jones (2019)

Sensorimotor control of vocal production in early childhood.

Journal of experimental psychology. General pii:2019-62257-001 [Epub ahead of print].

Children maintain fluent speech despite dramatic changes to their articulators during development. Auditory feedback aids in the acquisition and maintenance of the sensorimotor mechanisms that underlie vocal motor control. MacDonald, Johnson, Forsythe, Plante, and Munhall (2012) reported that toddlers' speech motor control systems may "suppress" the influence of auditory feedback, since exposure to altered auditory feedback regarding their formant frequencies did not lead to modifications of their speech. This finding is not parsimonious with most theories of motor control. Here, we exposed toddlers to perturbations to the pitch of their auditory feedback as they vocalized. Toddlers compensated for the manipulations, producing significantly different responses to upward and downward perturbations. These data represent the first empirical demonstration that toddlers use auditory feedback for vocal motor control. Furthermore, our findings suggest toddlers are more sensitive to changes to the postural properties of their auditory feedback, such as fundamental frequency, relative to the phonemic properties, such as formant frequencies. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

RevDate: 2019-10-08

Conklin JT, O Dmitrieva (2019)

Vowel-to-Vowel Coarticulation in Spanish Nonwords.

Phonetica pii:000502890 [Epub ahead of print].

The present study examined vowel-to-vowel (VV) coarticulation in backness affecting mid vowels /e/ and /o/ in 36 Spanish nonwords produced by 20 native speakers of Spanish, aged 19-50 years (mean = 30.7; SD = 8.2). Examination of second formant frequency showed substantial carryover coarticulation throughout the data set, while anticipatory coarticulation was minimal and of shorter duration. Furthermore, the effect of stress on vowel-to-vowel coarticulation was investigated and found to vary by direction. In the anticipatory direction, small coarticulatory changes were relatively stable regardless of stress, particularly for target /e/, while in the carryover direction, a hierarchy of stress emerged wherein the greatest coarticulation occurred between stressed triggers and unstressed targets, less coarticulation was observed between unstressed triggers and unstressed targets, and the least coarticulation occurred between unstressed triggers with stressed targets. The results of the study augment and refine previously available knowledge about vowel-to-vowel coarticulation in Spanish and expand cross-linguistic understanding of the effect of stress on the magnitude and direction of vowel-to-vowel coarticulation.

RevDate: 2019-12-20

Lee Y, Keating P, J Kreiman (2019)

Acoustic voice variation within and between speakers.

The Journal of the Acoustical Society of America, 146(3):1568.

Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

RevDate: 2020-01-08

Balaguer M, Pommée T, Farinas J, et al (2020)

Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review.

Head & neck, 42(1):111-130.

BACKGROUND: The development of automatic tools based on acoustic analysis allows to overcome the limitations of perceptual assessment for patients with head and neck cancer. The aim of this study is to provide a systematic review of literature describing the effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis.

METHODS: Two databases (PubMed and Embase) were surveyed. The selection process, according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement, led to a final set of 22 articles.

RESULTS: Nasalance is studied mainly in oropharyngeal patients. The vowels are mostly studied using formant analysis and vowel space area, the consonants by means of spectral moments with specific parameters according to their phonetic characteristic. Machine learning methods allow classifying "intelligible" or "unintelligible" speech for T3 or T4 tumors.

CONCLUSIONS: The development of comprehensive models combining different acoustic measures would allow a better consideration of the functional impact of the speech disorder.

RevDate: 2019-09-23

Zeng Q, Jiao Y, Huang X, et al (2019)

Effects of Angle of Epiglottis on Aerodynamic and Acoustic Parameters in Excised Canine Larynges.

Journal of voice : official journal of the Voice Foundation, 33(5):627-633.

OBJECTIVES: The aim of this study is to explore the effects of the angle of epiglottis (Aepi) on phonation and resonance in excised canine larynges.

METHODS: The anatomic Aepi was measured for 14 excised canine larynges as a control. Then, the Aepis were manually adjusted to 60° and 90° in each larynx. Aerodynamic and acoustic parameters, including mean flow rate, sound pressure level, jitter, shimmer, fundamental frequency (F0), and formants (F1'-F4'), were measured with a subglottal pressure of 1.5 kPa. Simple linear regression analysis between acoustic and aerodynamic parameters and the Aepi of the control was performed, and an analysis of variance comparing the acoustic and aerodynamic parameters of the three treatments was carried out.

RESULTS: The results of the study are as follows: (1) the larynges with larger anatomic Aepi had significantly lower jitter, shimmer, formant 1, and formant 2; (2) phonation threshold flow was significantly different for the three treatments; and (3) mean flow rate and sound pressure level were significantly different between the 60° and the 90° treatments of the 14 larynges.

CONCLUSIONS: The Aepi was proposed for the first time in this study. The Aepi plays an important role in phonation and resonance of excised canine larynges.

RevDate: 2019-09-18

Dmitrieva O, I Dutta (2019)

Acoustic Correlates of the Four-Way Laryngeal Contrast in Marathi.

Phonetica pii:000501673 [Epub ahead of print].

The study examines acoustic correlates of the four-way laryngeal contrast in Marathi, focusing on temporal parameters, voice quality, and onset f0. Acoustic correlates of the laryngeal contrast were investigated in the speech of 33 native speakers of Marathi, recorded in Mumbai, India, producing a word list containing six sets of words minimally contrastive in terms of laryngeal specification of word-initial velar stops. Measurements were made for the duration of prevoicing, release, and voicing during release. Fundamental frequency was measured at the onset of voicing following the stop and at 10 additional time points. As measures of voice quality, amplitude differences between the first and second harmonic (H1-H2) and between the first harmonic and the third formant (H1-A3) were calculated. The results demonstrated that laryngeal categories in Marathi are differentiated based on temporal measures, voice quality, and onset f0, although differences in each dimension were unequal in magnitude across different pairs of stop categories. We conclude that a single acoustic correlate, such as voice onset time, is insufficient to differentiate among all the laryngeal categories in languages such as Marathi, characterized by complex four-way laryngeal contrasts. Instead, a joint contribution of several acoustic correlates creates a robust multidimensional contrast.

RevDate: 2019-09-20

Guan J, C Liu (2019)

Speech Perception in Noise With Formant Enhancement for Older Listeners.

Journal of speech, language, and hearing research : JSLHR, 62(9):3290-3301.

Purpose Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to explore whether 2nd formant (F2) enhancement improves speech perception in noise for older listeners with hearing impairment (HI) and normal hearing (NH). Method Target words (e.g., color and digit) were selected and presented based on the paradigm of the coordinate response measure corpus. Speech recognition thresholds with original and F2-enhanced speech in 2- and 6-talker babble were examined for older listeners with NH and HI. Results The thresholds for both the NH and HI groups improved for enhanced speech signals primarily in 2-talker babble, but not in 6-talker babble. The F2 enhancement benefits did not correlate significantly with listeners' age and their average hearing thresholds in most listening conditions. However, speech intelligibility index values increased significantly with F2 enhancement in babble for listeners with HI, but not for NH listeners. Conclusions Speech sounds with F2 enhancement may improve listeners' speech perception in 2-talker babble, possibly due to a greater amount of speech information available in temporally modulated noise or a better capacity to separate speech signals from background babble.

RevDate: 2019-09-01

Klein E, Brunner J, P Hoole (2019)

The influence of coarticulatory and phonemic relations on individual compensatory formant production.

The Journal of the Acoustical Society of America, 146(2):1265.

Previous auditory perturbation studies have shown that speakers are able to simultaneously use multiple compensatory strategies to produce a certain acoustic target. In the case of formant perturbation, these findings were obtained examining the compensatory production for low vowels /ɛ/ and /æ/. This raises some controversy as more recent research suggests that the contribution of the somatosensory feedback to the production of vowels might differ across phonemes. In particular, the compensatory magnitude to auditory perturbations is expected to be weaker for high vowels compared to low vowels since the former are characterized by larger linguopalatal contact. To investigate this hypothesis, this paper conducted a bidirectional auditory perturbation study in which F2 of the high central vowel /ɨ/ was perturbed in opposing directions depending on the preceding consonant (alveolar vs velar). The consonants were chosen such that speakers' usual coarticulatory patterns were either compatible or incompatible with the required compensatory strategy. The results demonstrate that speakers were able to compensate for applied perturbations even if speakers' compensatory movements resulted in unusual coarticulatory configurations. However, the results also suggest that individual compensatory patterns were influenced by additional perceptual factors attributable to the phonemic space surrounding the target vowel /ɨ/.

RevDate: 2019-09-01

Migimatsu K, IT Tokuda (2019)

Experimental study on nonlinear source-filter interaction using synthetic vocal fold models.

The Journal of the Acoustical Society of America, 146(2):983.

Under certain conditions, e.g., singing voice, the fundamental frequency of the vocal folds can go up and interfere with the formant frequencies. Acoustic feedback from the vocal tract filter to the vocal fold source then becomes strong and non-negligible. An experimental study was presented on such source-filter interaction using three types of synthetic vocal fold models. Asymmetry was also created between the left and right vocal folds. The experiment reproduced various nonlinear phenomena, such as frequency jump and quenching, as reported in humans. Increase in phonation threshold pressure was also observed when resonant frequency of the vocal tract and fundamental frequency of the vocal folds crossed each other. As a combined effect, the phonation threshold pressure was further increased by the left-right asymmetry. Simulation of the asymmetric two-mass model reproduced the experiments to some extent. One of the intriguing findings of this study is the variable strength of the source-filter interaction over different model types. Among the three models, two models were strongly influenced by the vocal tract, while no clear effect of the vocal tract was observed in the other model. This implies that the level of source-filter interaction may vary considerably from one subject to another in humans.

RevDate: 2019-11-19

Max L, A Daliri (2019)

Limited Pre-Speech Auditory Modulation in Individuals Who Stutter: Data and Hypotheses.

Journal of speech, language, and hearing research : JSLHR, 62(8S):3071-3084.

Purpose We review and interpret our recent series of studies investigating motor-to-auditory influences during speech movement planning in fluent speakers and speakers who stutter. In those studies, we recorded auditory evoked potentials in response to probe tones presented immediately prior to speaking or at the equivalent time in no-speaking control conditions. As a measure of pre-speech auditory modulation (PSAM), we calculated changes in auditory evoked potential amplitude in the speaking conditions relative to the no-speaking conditions. Whereas adults who do not stutter consistently showed PSAM, this phenomenon was greatly reduced or absent in adults who stutter. The same between-group difference was observed in conditions where participants expected to hear their prerecorded speech played back without actively producing it, suggesting that the speakers who stutter use inefficient forward modeling processes rather than inefficient motor command generation processes. Compared with fluent participants, adults who stutter showed both less PSAM and less auditory-motor adaptation when producing speech while exposed to formant-shifted auditory feedback. Across individual participants, however, PSAM and auditory-motor adaptation did not correlate in the typically fluent group, and they were negatively correlated in the stuttering group. Interestingly, speaking with a consistent 100-ms delay added to the auditory feedback signal-normalized PSAM in speakers who stutter, and there no longer was a between-group difference in this condition. Conclusions Combining our own data with human and animal neurophysiological evidence from other laboratories, we interpret the overall findings as suggesting that (a) speech movement planning modulates auditory processing in a manner that may optimize its tuning characteristics for monitoring feedback during speech production and, (b) in conditions with typical auditory feedback, adults who stutter do not appropriately modulate the auditory system prior to speech onset. Lack of modulation of speakers who stutter may lead to maladaptive feedback-driven movement corrections that manifest themselves as repetitive movements or postural fixations.

RevDate: 2019-11-01

Plummer AR, PF Reidy (2018)

Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses.

Journal of phonetics, 71:355-375.

Low-dimensional representations of speech data, such as formant values extracted by linear predictive coding analysis or spectral moments computed from whole spectra viewed as probability distributions, have been instrumental in both phonetic and phonological analyses over the last few decades. In this paper, we present a framework for computing low-dimensional representations of speech data based on two assumptions: that speech data represented in high-dimensional data spaces lie on shapes called manifolds that can be used to map speech data to low-dimensional coordinate spaces, and that manifolds underlying speech data are generated from a combination of language-specific lexical, phonological, and phonetic information as well as culture-specific socio-indexical information that is expressed by talkers of a given speech community. We demonstrate the basic mechanics of the framework by carrying out an analysis of children's productions of sibilant fricatives relative to those of adults in their speech community using the phoneigen package - a publicly available implementation of the framework. We focus the demonstration on enumerating the steps for constructing manifolds from data and then using them to map the data to a low-dimensional space, explicating how manifold structure affects the learned low-dimensional representations, and comparing the use of these representations against standard acoustic features in a phonetic analysis. We conclude with a discussion of the framework's underlying assumptions, its broader modeling potential, and its position relative to recent advances in the field of representation learning.

RevDate: 2019-09-20

Jain S, NP Nataraja (2019)

The Relationship between Temporal Integration and Temporal Envelope Perception in Noise by Males with Mild Sensorineural Hearing Loss.

The journal of international advanced otology, 15(2):257-262.

OBJECTIVES: A surge of literature indicated that temporal integration and temporal envelope perception contribute largely to the perception of speech. A review of literature showed that the perception of speech with temporal integration and temporal envelope perception in noise might be affected due to sensorineural hearing loss but to a varying degree. Because the temporal integration and temporal envelope share similar physiological processing at the cochlear level, the present study was aimed to identify the relationship between temporal integration and temporal envelope perception in noise by individuals with mild sensorineural hearing loss.

MATERIALS AND METHODS: Thirty adult males with mild sensorineural hearing loss and thirty age- and gender-matched normal-hearing individuals volunteered for being the participants of the study. The temporal integration was measured using synthetic consonant-vowel-consonant syllables, varied for onset, offset, and onset-offset of second and third formant frequencies of the vowel following and preceding consonants in six equal steps, thus forming a six-step onset, offset, and onset-offset continuum, each. The duration of the transition was kept short (40 ms) in one set of continua and long (80 ms) in another. Temporal integration scores were calculated as the differences in the identification of the categorical boundary between short- and long-transition continua. Temporal envelope perception was measured using sentences processed in quiet, 0 dB, and -5 dB signal-to-noise ratios at 4, 8, 16, and 32 contemporary frequency channels, and the temporal envelope was extracted for each sentence using the Hilbert transformation.

RESULTS: A significant effect of hearing loss was observed on temporal integration, but not on temporal envelope perception. However, when the temporal integration abilities were controlled, the variable effect of hearing loss on temporal envelope perception was noted.

CONCLUSION: It was important to measure the temporal integration to accurately account for the envelope perception by individuals with normal hearing and those with hearing loss.

RevDate: 2019-08-18

Cartei V, Garnham A, Oakhill J, et al (2019)

Children can control the expression of masculinity and femininity through the voice.

Royal Society open science, 6(7):190656 pii:rsos190656.

Pre-pubertal boys and girls speak with acoustically different voices despite the absence of a clear anatomical dimorphism in the vocal apparatus, suggesting that a strong component of the expression of gender through the voice is behavioural. Initial evidence for this hypothesis was found in a previous study showing that children can alter their voice to sound like a boy or like a girl. However, whether they can spontaneously modulate these voice components within their own gender in order to vary the expression of their masculinity and femininity remained to be investigated. Here, seventy-two English-speaking children aged 6-10 were asked to give voice to child characters varying in masculine and feminine stereotypicality to investigate whether primary school children spontaneously adjust their sex-related cues in the voice-fundamental frequency (F0) and formant spacing (ΔF)-along gender stereotypical lines. Boys and girls masculinized their voice, by lowering F0 and ΔF, when impersonating stereotypically masculine child characters of the same sex. Girls and older boys also feminized their voice, by raising their F0 and ΔF, when impersonating stereotypically feminine same-sex child characters. These findings reveal that children have some knowledge of the sexually dimorphic acoustic cues underlying the expression of gender, and are capable of controlling them to modulate gender-related attributes, paving the way for the use of the voice as an implicit, objective measure of the development of gender stereotypes and behaviour.

RevDate: 2019-11-06

Dorman MF, Natale SC, Zeitler DM, et al (2019)

Looking for Mickey Mouse™ But Finding a Munchkin: The Perceptual Effects of Frequency Upshifts for Single-Sided Deaf, Cochlear Implant Patients.

Journal of speech, language, and hearing research : JSLHR, 62(9):3493-3499.

Purpose Our aim was to make audible for normal-hearing listeners the Mickey Mouse™ sound quality of cochlear implants (CIs) often found following device activation. Method The listeners were 3 single-sided deaf patients fit with a CI and who had 6 months or less of CI experience. Computed tomography imaging established the location of each electrode contact in the cochlea and allowed an estimate of the place frequency of the tissue nearest each electrode. For the most apical electrodes, this estimate ranged from 650 to 780 Hz. To determine CI sound quality, a clean signal (a sentence) was presented to the CI ear via a direct connect cable and candidate, and CI-like signals were presented to the ear with normal hearing via an insert receiver. The listeners rated the similarity of the candidate signals to the sound of the CI on a 1- to 10-point scale, with 10 being a complete match. Results To make the match to CI sound quality, all 3 patients need an upshift in formant frequencies (300-800 Hz) and a metallic sound quality. Two of the 3 patients also needed an upshift in voice pitch (10-80 Hz) and a muffling of sound quality. Similarity scores ranged from 8 to 9.7. Conclusion The formant frequency upshifts, fundamental frequency upshifts, and metallic sound quality experienced by the listeners can be linked to the relatively basal locations of the electrode contacts and short duration experience with their devices. The perceptual consequence was not the voice quality of Mickey Mouse™ but rather that of Munchkins in The Wizard of Oz for whom both formant frequencies and voice pitch were upshifted. Supplemental Material https://doi.org/10.23641/asha.9341651.

RevDate: 2019-08-10

Knight EJ, SF Austin (2019)

The Effect of Head Flexion/Extension on Acoustic Measures of Singing Voice Quality.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30117-1 [Epub ahead of print].

A study was undertaken to identify the effect of head flexion/extension on singing voice quality. The amplitude of the fundamental frequency (F0) and the singing power ratio (SPR), an indirect measure of Singer's Formant activity, were measured. F0 and SPR scores at four experimental head positions were compared with the subjects' scores at their habitual positions. Three vowels and three pitch levels were tested. F0 amplitudes and low-frequency partials in general were greater with neck extension, while SPR increased with neck flexion. No effect of pitch or vowel was found. Gains in SPR appear to be the result of damping low-frequency partials rather than amplifying those in the Singer's Formant region. Raising the amplitude of F0 is an important resonance tool for female voices in the high range, and may be of benefit to other voice types in resonance, loudness, and laryngeal function.

RevDate: 2019-08-08

Alho K, Żarnowiec K, Gorina-Careta N, et al (2019)

Phonological Task Enhances the Frequency-Following Response to Deviant Task-Irrelevant Speech Sounds.

Frontiers in human neuroscience, 13:245.

In electroencephalography (EEG) measurements, processing of periodic sounds in the ascending auditory pathway generates the frequency-following response (FFR) phase-locked to the fundamental frequency (F0) and its harmonics of a sound. We measured FFRs to the steady-state (vowel) part of syllables /ba/ and /aw/ occurring in binaural rapid streams of speech sounds as frequently repeating standard syllables or as infrequent (p = 0.2) deviant syllables among standard /wa/ syllables. Our aim was to study whether concurrent active phonological processing affects early processing of irrelevant speech sounds reflected by FFRs to these sounds. To this end, during syllable delivery, our healthy adult participants performed tasks involving written letters delivered on a computer screen in a rapid stream. The stream consisted of vowel letters written in red, infrequently occurring consonant letters written in the same color, and infrequently occurring vowel letters written in blue. In the phonological task, the participants were instructed to press a response key to the consonant letters differing phonologically but not in color from the frequently occurring red vowels, whereas in the non-phonological task, they were instructed to respond to the vowel letters written in blue differing only in color from the frequently occurring red vowels. We observed that the phonological task enhanced responses to deviant /ba/ syllables but not responses to deviant /aw/ syllables. This suggests that active phonological task performance may enhance processing of such small changes in irrelevant speech sounds as the 30-ms difference in the initial formant-transition time between the otherwise identical syllables /ba/ and /wa/ used in the present study.

RevDate: 2019-08-02

Birkholz P, Gabriel F, Kürbis S, et al (2019)

How the peak glottal area affects linear predictive coding-based formant estimates of vowels.

The Journal of the Acoustical Society of America, 146(1):223.

The estimation of formant frequencies from acoustic speech signals is mostly based on Linear Predictive Coding (LPC) algorithms. Since LPC is based on the source-filter model of speech production, the formant frequencies obtained are often implicitly regarded as those for an infinite glottal impedance, i.e., a closed glottis. However, previous studies have indicated that LPC-based formant estimates of vowels generated with a realistically varying glottal area may substantially differ from the resonances of the vocal tract with a closed glottis. In the present study, the deviation between closed-glottis resonances and LPC-estimated formants during phonation with different peak glottal areas has been systematically examined both using physical vocal tract models excited with a self-oscillating rubber model of the vocal folds, and by computer simulations of interacting source and filter models. Ten vocal tract resonators representing different vowels have been analyzed. The results showed that F1 increased with the peak area of the time-varying glottis, while F2 and F3 were not systematically affected. The effect of the peak glottal area on F1 was strongest for close-mid to close vowels, and more moderate for mid to open vowels.

RevDate: 2019-08-02

Patel RR, Lulich SM, A Verdi (2019)

Vocal tract shape and acoustic adjustments of children during phonation into narrow flow-resistant tubes.

The Journal of the Acoustical Society of America, 146(1):352.

The goal of the study is to quantify the salient vocal tract acoustic, subglottal acoustic, and vocal tract physiological characteristics during phonation into a narrow flow-resistant tube with 2.53 mm inner diameter and 124 mm length in typically developing vocally healthy children using simultaneous microphone, accelerometer, and 3D/4D ultrasound recordings. Acoustic measurements included fundamental frequency (fo), first formant frequency (F1), second formant frequency (F2), first subglottal resonance (FSg1), and peak-to-peak amplitude ratio (Pvt:Psg). Physiological measurements included posterior tongue height (D1), tongue dorsum height (D2), tongue tip height (D3), tongue length (D4), oral cavity width (D5), hyoid elevation (D6), pharynx width (D7). All measurements were made on eight boys and ten girls (6-9 years) during sustained /o:/ production at typical pitch and loudness, with and without flow-resistant tube. Phonation with the flow-resistant tube resulted in a significant decrease in F1, F2, and Pvt:Psg and a significant increase in D2, D3, and FSg1. A statistically significant gender effect was observed for D1, with D1 higher in boys. These findings agree well with reported findings from adults, suggesting common acoustic and articulatory mechanisms for narrow flow-resistant tube phonation. Theoretical implications of the findings are discussed.

RevDate: 2019-08-05

Wadamori N (2019)

Evaluation of a photoacoustic bone-conduction vibration system.

The Review of scientific instruments, 90(7):074905.

This article proposes a bone conduction vibrator that is based on a phenomenon by which audible sound can be perceived when vibrations are produced using a laser beam that is synchronized to the sound and these vibrations are then transmitted to an auricular cartilage. To study this phenomenon, we measured the vibrations using a rubber sheet with similar properties to those of soft tissue in combination with an acceleration sensor. We also calculated the force level of the sound based on the mechanical impedance and the acceleration in the proposed system. We estimated the formant frequencies of specific vibrations that were synchronized to five Japanese vowels using this phenomenon. We found that the vibrations produced in the rubber sheet caused audible sound generation when the photoacoustic bone conduction vibration system was used. It is expected that a force level that is equal to the reference equivalent threshold force level can be achieved at light intensities that lie below the safety limit for human skin exposure by selecting an irradiation wavelength at which a high degree of optical absorption occurs. It is demonstrated that clear sounds can be transmitted to the cochlea using the proposed system, while the effects of acoustic and electric noise in the environment are barred. Improvements in the vibratory force levels realized using this system will enable the development of a novel hearing aid that will provide an alternative to conventional bone conduction hearing aids.

RevDate: 2019-07-26

Kaneko M, Sugiyama Y, Mukudai S, et al (2019)

Effect of Voice Therapy Using Semioccluded Vocal Tract Exercises in Singers and Nonsingers With Dysphonia.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30210-3 [Epub ahead of print].

OBJECTIVES: Voice therapy with semioccluded vocal tract exercises (SOVTE) has a long history of use in singers and nonsingers with dysphonia. SOVTE with increased vocal tract impedance leads to increased vocal efficiency and economy. Although there is a growing body of research on the physiological impact of SOVTE, and growing clinical sentiment about its therapeutic benefits, empirical data describing its potential efficacy in singers and nonsingers are lacking. The objective of the current study is to evaluate vocal tract function and voice quality in singers and nonsingers with dysphonia after undergoing SOVTE.

METHODS: Patients who were diagnosed with functional dysphonia, vocal fold nodules and age-related atrophy were assessed (n = 8 singers, n = 8 nonsingers). Stroboscopic examination, aerodynamic assessment, acoustic analysis, formant frequency, and self-assessments were evaluated before and after performing SOVTE.

RESULTS: In the singer group, expiratory lung pressure, jitter, shimmer, and self-assessment significantly improved after SOVTE. In addition, formant frequency (first, second, third, and fourth), and the standard deviation (SD) of the first, second, and third formant frequency significantly improved. In the nonsinger group, expiratory lung pressure, jitter, shimmer, and Voice Handicap Index-10 significantly improved after SOVTE. However, no significant changes were observed in formant frequency.

CONCLUSIONS: These results suggest that SOVTE may improve voice quality in singers and nonsingers with dysphonia, and SOVTE may be more effective at adjusting the vocal tract function in singers with dysphonia compared to nonsingers.

RevDate: 2019-07-23

Myers S (2019)

An Acoustic Study of Sandhi Vowel Hiatus in Luganda.

Language and speech [Epub ahead of print].

In Luganda (Bantu, Uganda), a sequence of vowels in successive syllables (V.V) is not allowed. If the first vowel is high, the two vowels are joined together in a diphthong (e.g., i + a → i͜a). If the first vowel is non-high, it is deleted with compensatory lengthening of the second vowel in the sequence (e.g., e + a → aː). This paper presents an acoustic investigation of inter-word V#V sequences in Luganda. It was found that the vowel interval in V#V sequences is longer than that in V#C sequences. When the first vowel in V#V is non-high, the formant frequency of the outcome is determined by the second vowel in the sequence. When the first vowel is high, on the other hand, the sequence is realized as a diphthong, with the transition between the two formant patterns taking up most of the duration. The durational patterns within these diphthongs provide evidence against the transcription-based claim that these sequences are reorganized so that the length lies in the second vowel (/i#V/ → [jVː]). The findings bring into question a canonical case of compensatory lengthening conditioned by glide formation.

RevDate: 2019-07-15

Longo L, Di Stadio A, Ralli M, et al (2019)

Voice Parameter Changes in Professional Musician-Singers Singing with and without an Instrument: The Effect of Body Posture.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000501202 [Epub ahead of print].

BACKGROUND AND AIM: The impact of body posture on vocal emission is well known. Postural changes may increase muscular resistance in tracts of the phono-articulatory apparatus and lead to voice disorders. This work aimed to assess whether and to which extent body posture during singing and playing a musical instrument impacts voice performance in professional musicians.

SUBJECTS AND METHODS: Voice signals were recorded from 17 professional musicians (pianists and guitarists) while they were singing and while they were singing and playing a musical instrument simultaneously. Metrics were extracted from their voice spectrogram using the Multi-Dimensional Voice Program (MDVP) and included jitter, shift in fundamental voice frequency (sF0), shimmer, change in peak amplitude, noise to harmonic ratio, Voice Turbulence Index, Soft Phonation Index (SPI), Frequency Tremor Intensity Index, Amplitude Tremor Intensity Index, and maximum phonatory time (MPT). Statistical analysis was performed using two-tailed t tests, one-way ANOVA, and χ2 tests. Subjects' body posture was visually assessed following the recommendations of the Italian Society of Audiology and Phoniatrics. Thirty-seven voice signals were collected, 17 during singing and 20 during singing and playing a musical instrument.

RESULTS: Data showed that playing an instrument while singing led to an impairment of the "singer formant" and to a decrease in jitter, sF0, shimmer, SPI, and MPT. However, statistical analysis showed that none of the MDVP metrics changed significantly when subjects played an instrument compared to when they did not. Shoulder and back position affected voice features as measured by the MDVP metrics, while head and neck position did not. In particular, playing the guitar decreased the amplitude of the "singer formant" and increased noise, causing a typical "raucous rock voice."

CONCLUSIONS: Voice features may be affected by the use of the instrument the musicians play while they sing. Body posture selected by the musician while playing the instrument may affect expiration and phonation.

RevDate: 2020-01-03

Whitfield JA, DD Mehta (2019)

Examination of Clear Speech in Parkinson Disease Using Measures of Working Vowel Space.

Journal of speech, language, and hearing research : JSLHR, 62(7):2082-2098.

Purpose The purpose of the current study was to characterize clear speech production for speakers with and without Parkinson disease (PD) using several measures of working vowel space computed from frequently sampled formant trajectories. Method The 1st 2 formant frequencies were tracked for a reading passage that was produced using habitual and clear speaking styles by 15 speakers with PD and 15 healthy control speakers. Vowel space metrics were calculated from the distribution of frequently sampled formant frequency tracks, including vowel space hull area, articulatory-acoustic vowel space, and multiple vowel space density (VSD) measures based on different percentile contours of the formant density distribution. Results Both speaker groups exhibited significant increases in the articulatory-acoustic vowel space and VSD10, the area of the outermost (10th percentile) contour of the formant density distribution, from habitual to clear styles. These clarity-related vowel space increases were significantly smaller for speakers with PD than controls. Both groups also exhibited a significant increase in vowel space hull area; however, this metric was not sensitive to differences in the clear speech response between groups. Relative to healthy controls, speakers with PD exhibited a significantly smaller VSD90, the area of the most central (90th percentile), densely populated region of the formant space. Conclusions Using vowel space metrics calculated from formant traces of the reading passage, the current work suggests that speakers with PD do indeed reach the more peripheral regions of the vowel space during connected speech but spend a larger percentage of the time in more central regions of formant space than healthy speakers. Additionally, working vowel space metrics based on the distribution of formant data suggested that speakers with PD exhibited less of a clarity-related increase in formant space than controls, a trend that was not observed for perimeter-based measures of vowel space area.

RevDate: 2019-12-18

Chiu YF, Forrest K, T Loux (2019)

Relationship Between F2 Slope and Intelligibility in Parkinson's Disease: Lexical Effects and Listening Environment.

American journal of speech-language pathology, 28(2S):887-894.

Purpose There is a complex relationship between speech production and intelligibility of speech. The current study sought to evaluate the interaction of the factors of lexical characteristics, listening environment, and the 2nd formant transition (F2 slope) on intelligibility of speakers with Parkinson's disease (PD). Method Twelve speakers with PD and 12 healthy controls read sentences that included words with the diphthongs /aɪ/, /ɔɪ/, and /aʊ/. The F2 slope of the diphthong transition was measured and averaged across the 3 diphthongs for each speaker. Young adult listeners transcribed the sentences to assess intelligibility of words with high and low word frequency and high and low neighborhood density in quiet and noisy listening conditions. The average F2 slope and intelligibility scores were entered into regression models to examine their relationship. Results F2 slope was positively related to intelligibility in speakers with PD in both listening conditions with a stronger relationship in noise than in quiet. There was no significant relationship between F2 slope and intelligibility of healthy speakers. In the quiet condition, F2 slope was only correlated with intelligibility in less-frequent words produced by the PD group. In the noise condition, F2 slope was related to intelligibility in high- and low-frequency words and high-density words in PD. Conclusions The relationship between F2 slope and intelligibility in PD was affected by lexical factors and listening conditions. F2 slope was more strongly related to intelligibility in noise than in quiet for speakers with PD. This relationship was absent in highly frequent words presented in quiet and those with fewer lexical neighbors.

RevDate: 2020-01-03

Bauerly KR, Jones RM, C Miller (2019)

Effects of Social Stress on Autonomic, Behavioral, and Acoustic Parameters in Adults Who Stutter.

Journal of speech, language, and hearing research : JSLHR, 62(7):2185-2202.

Purpose The purpose of this study was to assess changes in autonomic, behavioral, and acoustic measures in response to social stress in adults who stutter (AWS) compared to adults who do not stutter (ANS). Method Participants completed the State-Trait Anxiety Inventory (Speilberger, Gorsuch, Luschene, Vagg, & Jacobs, 1983). In order to provoke social stress, participants were required to complete a modified version of the Trier Social Stress Test (TSST-M, Kirschbaum, Pirke, & Hellhammer, 1993), which included completing a nonword reading task and then preparing and delivering a speech to what was perceived as a group of professionals trained in public speaking. Autonomic nervous system changes were assessed by measuring skin conductance levels, heart rate, and respiratory sinus arrhythmia (RSA). Behavioral changes during speech production were measured in errors, percentage of syllable stuttered, percentage of other disfluencies, and speaking rate. Acoustic changes were measured using 2nd formant frequency fluctuations. In order to make comparisons of speech with and without social-cognitive stress, measurements were collected while participants completed a speaking task before and during TSST-M conditions. Results AWS showed significantly higher levels of self-reported state and trait anxiety compared to ANS. Autonomic nervous system changes revealed similar skin conductance level and heart rate across pre-TSST-M and TSST-M conditions; however, RSA levels were significantly higher in AWS compared to ANS across conditions. There were no differences found between groups for speaking rate, fundamental frequency, and percentage of other disfluencies when speaking with or without social stress. However, acoustic analysis revealed higher levels of 2nd formant frequency fluctuations in the AWS compared to the controls under pre-TSST-M conditions, followed by a decline to a level that resembled controls when speaking under the TSST-M condition. Discussion Results suggest that AWS, compared to ANS, engage higher levels of parasympathetic control (i.e., RSA) during speaking, regardless of stress level. Higher levels of self-reported state and trait anxiety support this view point and suggest that anxiety may have an indirect role on articulatory variability in AWS.

RevDate: 2019-06-30

Charles S, SM Lulich (2019)

Articulatory-acoustic relations in the production of alveolar and palatal lateral sounds in Brazilian Portuguese.

The Journal of the Acoustical Society of America, 145(6):3269.

Lateral approximant speech sounds are notoriously difficult to measure and describe due to their complex articulation and acoustics. This has prevented researchers from reaching a unifying description of the articulatory and acoustic characteristics of laterals. This paper examines articulatory and acoustic properties of Brazilian Portuguese alveolar and palatal lateral approximants (/l/ and /ʎ/) produced by six native speakers. The methodology for obtaining vocal tract area functions was based on three-dimensional/four-dimensional (3D/4D) ultrasound recordings and 3D digitized palatal impressions with simultaneously recorded audio signals. Area functions were used to calculate transfer function spectra, and predicted formant and anti-resonance frequencies were compared with the acoustic recordings. Mean absolute error in formant frequency prediction was 4% with a Pearson correlation of r = 0.987. Findings suggest anti-resonances from the interdental channels are less important than a prominent anti-resonance from the supralingual cavity but can become important in asymmetrical articulations. The use of 3D/4D ultrasound to study articulatory-acoustic relations is promising, but significant limitations remain and future work is needed to make better use of 3D/4D ultrasound data, e.g., by combining it with magnetic resonance imaging.

RevDate: 2020-01-03

Horáček J, Radolf V, AM Laukkanen (2019)

Experimental and Computational Modeling of the Effects of Voice Therapy Using Tubes.

Journal of speech, language, and hearing research : JSLHR, 62(7):2227-2244.

Purpose Phonations into a tube with the distal end either in the air or submerged in water are used for voice therapy. This study explores the effective mechanisms of these therapy methods. Method The study applied a physical model complemented by calculations from a computational model, and the results were compared to those that have been reported for humans. The effects of tube phonation on vocal tract resonances and oral pressure variation were studied. The relationships of transglottic pressure variation in time Ptrans (t) versus glottal area variation in time GA(t) were constructed. Results The physical model revealed that, for the phonation on [u:] vowel through a glass resonance tube ending in the air, the 1st formant frequency (F1) decreased by 67%, from 315 Hz to 105 Hz, thus slightly above the fundamental frequency (F0) that was set to 90-94 Hz . For phonation through the tube into water, F1 decreased by 91%-92%, reaching 26-28 Hz, and the water bubbling frequency Fb ≅ 19-24 Hz was just below F1 . The relationships of Ptrans (t) versus GA(t) clearly differentiate vowel phonation from both therapy methods, and show a physical background for voice therapy with tubes. It is shown that comparable results have been measured in humans during tube therapy. For the tube in air, F1 descends closer to F0 , whereas for the tube in water, the frequency Fb occurs close to the acoustic-mechanical resonance of the human vocal tract. Conclusion In both therapy methods, part of the airflow energy required for phonation is substituted by the acoustic energy utilizing the 1st acoustic resonance. Thus, less flow energy is needed for vocal fold vibration, which results in improved vocal efficiency. The effect can be stronger in water resistance therapy if the frequency Fb approaches the acoustic-mechanical resonance of the vocal tract, while simultaneously F0 is voluntarily changed close to F1.

RevDate: 2019-06-27

Suresh CH, Krishnan A, X Luo (2019)

Human Frequency Following Responses to Vocoded Speech: Amplitude Modulation Versus Amplitude Plus Frequency Modulation.

Ear and hearing [Epub ahead of print].

OBJECTIVES: The most commonly employed speech processing strategies in cochlear implants (CIs) only extract and encode amplitude modulation (AM) in a limited number of frequency channels. Zeng et al. (2005) proposed a novel speech processing strategy that encodes both frequency modulation (FM) and AM to improve CI performance. Using behavioral tests, they reported better speech, speaker, and tone recognition with this novel strategy than with the AM-alone strategy. Here, we used the scalp-recorded human frequency following responses (FFRs) to examine the differences in the neural representation of vocoded speech sounds with AM alone and AM + FM as the spectral and temporal cues were varied. Specifically, we were interested in determining whether the addition of FM to AM improved the neural representation of envelope periodicity (FFRENV) and temporal fine structure (FFRTFS), as reflected in the temporal pattern of the phase-locked neural activity generating the FFR.

DESIGN: FFRs were recorded from 13 normal-hearing, adult listeners in response to the original unprocessed stimulus (a synthetic diphthong /au/ with a 110-Hz fundamental frequency or F0 and a 250-msec duration) and the 2-, 4-, 8- and 16-channel sine vocoded versions of /au/ with AM alone and AM + FM. Temporal waveforms, autocorrelation analyses, fast Fourier Transform, and stimulus-response spectral correlations were used to analyze both the strength and fidelity of the neural representation of envelope periodicity (F0) and TFS (formant structure).

RESULTS: The periodicity strength in the FFRENV decreased more for the AM stimuli than for the relatively resilient AM + FM stimuli as the number of channels was increased. Regardless of the number of channels, a clear spectral peak of FFRENV was consistently observed at the stimulus F0 for all the AM + FM stimuli but not for the AM stimuli. Neural representation as revealed by the spectral correlation of FFRTFS was better for the AM + FM stimuli when compared to the AM stimuli. Neural representation of the time-varying formant-related harmonics as revealed by the spectral correlation was also better for the AM + FM stimuli as compared to the AM stimuli.

CONCLUSIONS: These results are consistent with previously reported behavioral results and suggest that the AM + FM processing strategy elicited brainstem neural activity that better preserved periodicity, temporal fine structure, and time-varying spectral information than the AM processing strategy. The relatively more robust neural representation of AM + FM stimuli observed here likely contributes to the superior performance on speech, speaker, and tone recognition with the AM + FM processing strategy. Taken together, these results suggest that neural information preserved in the FFR may be used to evaluate signal processing strategies considered for CIs.

RevDate: 2019-07-09

Stansbury AL, VM Janik (2019)

Formant Modification through Vocal Production Learning in Gray Seals.

Current biology : CB, 29(13):2244-2249.e4.

Vocal production learning is a rare communication skill and has only been found in selected avian and mammalian species [1-4]. Although humans use learned formants and voiceless sounds to encode most lexical information [5], evidence for vocal learning in other animals tends to focus on the modulation pattern of the fundamental frequency [3, 4]. Attempts to teach mammals to produce human speech sounds have largely been unsuccessful, most notably in extensive studies on great apes [5]. The limited evidence for formant copying in mammals raises the question whether advanced learned control over formant production is uniquely human. We show that gray seals (Halichoerus grypus) have the ability to match modulations in peak frequency patterns of call sequences or melodies by modifying the formants in their own calls, moving outside of their normal repertoire's distribution of frequencies and even copying human vowel sounds. Seals also demonstrated enhanced auditory memory for call sequences by accurately copying sequential changes in peak frequency and the number of calls played to them. Our results demonstrate that formants can be influenced by vocal production learning in non-human vocal learners, providing a mammalian substrate for the evolution of flexible information coding in formants as found in human language.

RevDate: 2019-06-16

Dahl KL, LA Mahler (2019)

Acoustic Features of Transfeminine Voices and Perceptions of Voice Femininity.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30075-X [Epub ahead of print].

The purpose of this study was to evaluate the relationships between acoustic measures of transfeminine voices and both self- and listener ratings of voice femininity. Connected speech samples were collected from 12 transfeminine individuals (M = 36.3 years, SD = 10.6 years) and a control group of five cisgender (cis) women and five cis men (M = 35.3 years, SD = 13.3 years). The acoustic measures of fundamental frequency (fo), fo variation, formant frequencies, and vocal intensity were calculated from these samples. Transfeminine speakers rated their own voices on a five-point scale of voice femininity. Twenty inexperienced listeners heard an excerpt of each speech sample and rated the voices on the same five-point scale of voice femininity. Spearman's rank-order correlation coefficients were calculated to measure the relationships between the acoustic variables and ratings of voice femininity. Significant positive correlations were found between fo and both self-ratings (r = 0.712, P = 0.009) and listener ratings of voice femininity (r = 0.513, P < 0.001). Significant positive correlations were found between intensity and both self-ratings (r = 0.584, P = 0.046) and listener ratings of voice femininity (r = 0.584, P = 0.046). No significant correlations were found between fo variation or formant frequencies and perceptual ratings of voice femininity. A Pearson's chi-square test of independence showed that the distribution of self- and listener ratings differed significantly (χ2 = 9.668, P = 0.046). Self- and listener ratings were also shown to be strongly correlated (r = 0.912, P < 0.001). This study provides further evidence to support the selection of training targets in voice feminization programs for transfeminine individuals and promotes the use of self-ratings of voice as an important outcome measure.

RevDate: 2019-06-13

Sankar MSA, PS Sathidevi (2019)

A scalable speech coding scheme using compressive sensing and orthogonal mapping based vector quantization.

Heliyon, 5(5):e01820 pii:e01820.

A novel scalable speech coding scheme based on Compressive Sensing (CS), which can operate at bit rates from 3.275 to 7.275 kbps is designed and implemented in this paper. The CS based speech coding offers the benefit of combined compression and encryption with inherent de-noising and bit rate scalability. The non-stationary nature of speech signal causes the recovery process from CS measurements very complex due to the variation in sparsifying bases. In this work, the complexity of the recovery process is reduced by assigning a suitable basis to each frame of the speech signal based on its statistical properties. As the quality of the reconstructed speech depends on the sensing matrix used at the transmitter, a variant of Binary Permuted Block Diagonal (BPBD) matrix is also proposed here which offers a better performance than that of the commonly used Gaussian random matrix. To improve the coding efficiency, formant filter coefficients are quantized using the conventional Vector Quantization (VQ) and an orthogonal mapping based VQ is developed for the quantization of CS measurements. The proposed coding scheme offers the listening quality for reconstructed speech similar to that of Adaptive Multi rate - Narrowband (AMR-NB) codec at 6.7 kbps and Enhanced Voice Services (EVS) at 7.2 kbps. A separate de-noising block is not required in the proposed coding scheme due to the inherent de-noising property of CS. Scalability in bit rate is achieved in the proposed method by varying the number of random measurements and the number of levels for orthogonal mapping in the VQ stage of measurements.

RevDate: 2019-08-28
CmpDate: 2019-08-28

de Carvalho CC, da Silva DM, de Carvalho Junior AD, et al (2019)

Pre-operative voice evaluation as a hypothetical predictor of difficult laryngoscopy.

Anaesthesia, 74(9):1147-1152.

We examined the potential for voice sounds to predict a difficult airway as compared with prediction by the modified Mallampati test. A total of 453 patients scheduled for elective surgery under general anaesthesia with tracheal intubation were studied. Five phonemes were recorded and their formants analysed. Difficult laryngoscopy was defined as the Cormack-Lehane grade 3 or 4. Univariate and multivariate logistic regression were used to examine the association between some variables (mouth opening, sternomental distance, modified Mallampati and formants) and difficult laryngoscopy. Difficult laryngoscopy was reported in 29/453 (6.4%) patients. Among five regression models evaluated, the model achieving better performance to predict difficult laryngoscopy, after a variable selection criteria (stepwise, multivariate) and included a modified Mallampati classification (OR 2.920; 95%CI 1.992-4.279; p < 0.001), first formant of /i/(iF1) (OR 1.003; 95%CI 1.002-1.04; p < 0.001), and second formant of /i/(iF2) (OR 0.998; 95%CI 0.997-0.998; p < 0.001). The receiver operating curve for a regression model that included both formants and Mallampati showed an area under curve of 0.918, higher than formants alone (area under curve 0.761) and modified Mallampati alone (area under curve 0.874). Voice presented a significant association with difficult laryngoscopy during general anaesthesia showing a 76.1% probability of correctly classifying a randomly selected patient.

RevDate: 2020-01-14

Easwar V, Scollie S, D Purcell (2019)

Investigating potential interactions between envelope following responses elicited simultaneously by different vowel formants.

Hearing research, 380:35-45.

Envelope following responses (EFRs) evoked by the periodicity of voicing in vowels are elicited at the fundamental frequency of voice (f0), irrespective of the harmonics that initiate it. One approach of improving the frequency specificity of vowel stimuli without increasing test-time is by altering the f0 selectively in one or more formants. The harmonics contributing to an EFR can then be differentiated by the unique f0 at which the EFRs are elicited. The advantages of using such an approach would be increased frequency specificity and efficiency, given that multiple EFRs can be evaluated in a certain test-time. However, multiple EFRs elicited simultaneously could interact and lead to altered amplitudes and outcomes. To this end, the present study aimed to evaluate: (i) if simultaneous recording of two EFRs, one elicited by harmonics in the first formant (F1) and one elicited by harmonics in the second and higher formants (F2+), leads to attenuation or enhancement of EFR amplitude, and (ii) if simultaneous measurement of two EFRs affects its accuracy and anticipated efficiency. In a group of 22 young adults with normal hearing, EFRs were elicited by F1 and F2+ bands of /u/, /a/ and /i/ when F1 and F2+ were presented independently (individual), when F1 and F2+ were presented simultaneously (dual), and when F1 or F2+ was presented with spectrally matched Gaussian noise of the other (noise). Repeated-measures analysis of variance indicated no significant group differences in EFR amplitudes between any of the conditions, suggesting minimal between-EFR interactions. Between-participant variability was evident, however, significant changes were evident only in a third of the participants for the stimulus /u/ F1. For the majority of stimuli, the change between individual and dual conditions was positively correlated with the change between individual and noise conditions, suggesting that interaction-based changes in EFR amplitude, when present, were likely due to the restriction of cochlear regions of excitation in the presence of a competing stimulus. The amplitude of residual noise was significantly higher in the dual or noise relative to the individual conditions, although the mean differences were very small (<3 nV). F-test-based detection of EFRs, commonly used to determine the presence of an EFR, did not vary across conditions. Further, neither the mean reduction in EFR amplitude nor the mean increase in noise amplitude in dual relative to individual conditions was large enough to alter the anticipated gain in efficiency of simultaneous EFR recordings. Together, results suggest that the approach of simultaneously recording two vowel-evoked EFRs from different formants for improved frequency-specificity does not alter test accuracy and is more time-efficient than evaluating EFRs to each formant individually.

RevDate: 2019-12-01

Buckley DP, Dahl KL, Cler GJ, et al (2019)

Transmasculine Voice Modification: A Case Study.

Journal of voice : official journal of the Voice Foundation [Epub ahead of print].

This case study measured the effects of manual laryngeal therapy on the fundamental frequency (fo), formant frequencies, estimated vocal tract length, and listener perception of masculinity of a 32-year-old transmasculine individual. The participant began testosterone therapy 1.5 years prior to the study. Two therapy approaches were administered sequentially in a single session: (1) passive circumlaryngeal massage and manual laryngeal reposturing, and (2) active laryngeal reposturing with voicing. Acoustic recordings were collected before and after each treatment and 3 days after the session. Speaking fo decreased from 124 Hz to 120 Hz after passive training, and to 108 Hz after active training. Estimated vocal tract length increased from 17.0 cm to 17.3 cm after passive training, and to 19.4 cm after active training. Eight listeners evaluated the masculinity of the participant's speech; his voice was rated as most masculine at the end of the training session. All measures returned to baseline at follow-up. Overall, both acoustic and perceptual changes were observed in one transmasculine individual who participated in manual laryngeal therapy, even after significant testosterone-induced voice changes had already occurred; however, changes were not maintained in the follow-up. This study adds to scant literature on effective approaches to and proposed outcome measures for voice masculinization in transmasculine individuals.

RevDate: 2019-12-20

Chen WR, Whalen DH, CH Shadle (2019)

F0-induced formant measurement errors result in biased variabilities.

The Journal of the Acoustical Society of America, 145(5):EL360.

Many developmental studies attribute reduction of acoustic variability to increasing motor control. However, linear prediction-based formant measurements are known to be biased toward the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant variability generated by changes in F0 is unknown. Here, 470 000 vowels were synthesized, mimicking statistics reported in four developmental studies, to estimate the proportion of formant variability that can be attributed to F0 bias, as well as other formant measurement errors. Results showed that the F0-induced formant measurements errors are large and systematic, and cannot be eliminated by a large sample size.

RevDate: 2019-11-20

Briefer EF, Vizier E, Gygax L, et al (2019)

Expression of emotional valence in pig closed-mouth grunts: Involvement of both source- and filter-related parameters.

The Journal of the Acoustical Society of America, 145(5):2895.

Emotion expression plays a crucial role for regulating social interactions. One efficient channel for emotion communication is the vocal-auditory channel, which enables a fast transmission of information. Filter-related parameters (formants) have been suggested as a key to the vocal differentiation of emotional valence (positive versus negative) across species, but variation in relation to emotions has rarely been investigated. Here, whether pig (Sus scrofa domesticus) closed-mouth grunts differ in source- and filter-related features when produced in situations assumed to be positive and negative is investigated. Behavioral and physiological parameters were used to validate the animals' emotional state (both in terms of valence and arousal, i.e., bodily activation). Results revealed that grunts produced in a positive situation were characterized by higher formants, a narrower range of the third formant, a shorter duration, a lower fundamental frequency, and a lower harmonicity compared to negative grunts. Particularly, formant-related parameters and duration made up most of the difference between positive and negative grunts. Therefore, these parameters have the potential to encode dynamic information and to vary as a function of the emotional valence of the emitter in pigs, and possibly in other mammals as well.

RevDate: 2019-11-20

Houde JF, Gill JS, Agnew Z, et al (2019)

Abnormally increased vocal responses to pitch feedback perturbations in patients with cerebellar degeneration.

The Journal of the Acoustical Society of America, 145(5):EL372.

Cerebellar degeneration (CD) has deleterious effects on speech motor behavior. Recently, a dissociation between feedback and feedforward control of speaking was observed in CD: Whereas CD patients exhibited reduced adaptation across trials to consistent formant feedback alterations, they showed enhanced within-trial compensation for unpredictable formant feedback perturbations. In this study, it was found that CD patients exhibit abnormally increased within-trial vocal compensation responses to unpredictable pitch feedback perturbations. Taken together with recent findings, the results indicate that CD is associated with a general hypersensitivity to auditory feedback during speaking.

RevDate: 2019-11-20

Reinheimer DM, Andrade BMR, Nascimento JKF, et al (2019)

Formant Frequencies, Cephalometric Measures, and Pharyngeal Airway Width in Adults With Congenital, Isolated, and Untreated Growth Hormone Deficiency.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30061-X [Epub ahead of print].

OBJECTIVE: Adult subjects with isolated growth hormone deficiency (IGHD) due to a mutation in the growth hormone releasing hormone receptor gene exhibit higher values formant frequencies. In normal subjects, a significant negative association between the formant frequencies and the reduction of linear craniofacial measurements, especially of maxilla and mandible, has been reported. This suggests smaller pharyngeal width, despite low prevalence of obstructive sleep apnea syndrome. Here we evaluate their pharyngeal airway width, its correlation with vowel formant frequencies, and the correlation between them and the craniofacial measures.

SUBJECTS AND METHODS: A two-step protocol was performed. In the first case-control experiment, aimed to assess the pharyngeal width, we compared nine adult IGHD and 36 normal statured controls. Both upper and lower pharyngeal widths were measured. The second step (assessment of pharyngeal width) was performed only in the IGHD group.

RESULTS: Upper and lower pharyngeal widths were similar in IGHD and controls. In IGHD subjects, the lower pharyngeal width exhibited a negative correlation with F1 [a] and a positive correlation with mandibular length. There were negative correlations between F1 and F2 with linear and positive correlations with the angular measures.

CONCLUSIONS: Pharyngeal airway width is not reduced in adults with congenital, untreated lifetime IGHD, contributing to the low prevalence of obstructive sleep apnea syndrome. The formant frequencies relate more with cephalometric measurements than with the pharyngeal airway width. These findings exemplify the consequences of lifetime IGHD on osseous and nonosseous growth.

RevDate: 2020-01-08

Easwar V, Scollie S, Aiken S, et al (2020)

Test-Retest Variability in the Characteristics of Envelope Following Responses Evoked by Speech Stimuli.

Ear and hearing, 41(1):150-164.

OBJECTIVES: The objective of the present study was to evaluate the between-session test-retest variability in the characteristics of envelope following responses (EFRs) evoked by modified natural speech stimuli in young normal hearing adults.

DESIGN: EFRs from 22 adults were recorded in two sessions, 1 to 12 days apart. EFRs were evoked by the token /susa∫ i/ (2.05 sec) presented at 65 dB SPL and recorded from the vertex referenced to the neck. The token /susa∫ i/, spoken by a male with an average fundamental frequency [f0] of 98.53 Hz, was of interest because of its potential utility as an objective hearing aid outcome measure. Each vowel was modified to elicit two EFRs simultaneously by lowering the f0 in the first formant while maintaining the original f0 in the higher formants. Fricatives were amplitude-modulated at 93.02 Hz and elicited one EFR each. EFRs evoked by vowels and fricatives were estimated using Fourier analyzer and discrete Fourier transform, respectively. Detection of EFRs was determined by an F-test. Test-retest variability in EFR amplitude and phase coherence were quantified using correlation, repeated-measures analysis of variance, and the repeatability coefficient. The repeatability coefficient, computed as twice the standard deviation (SD) of test-retest differences, represents the ±95% limits of test-retest variation around the mean difference. Test-retest variability of EFR amplitude and phase coherence were compared using the coefficient of variation, a normalized metric, which represents the ratio of the SD of repeat measurements to its mean. Consistency in EFR detection outcomes was assessed using the test of proportions.

RESULTS: EFR amplitude and phase coherence did not vary significantly between sessions, and were significantly correlated across repeat measurements. The repeatability coefficient for EFR amplitude ranged from 38.5 nV to 45.6 nV for all stimuli, except for /∫/ (71.6 nV). For any given stimulus, the test-retest differences in EFR amplitude of individual participants were not correlated with their test-retest differences in noise amplitude. However, across stimuli, higher repeatability coefficients of EFR amplitude tended to occur when the group mean noise amplitude and the repeatability coefficient of noise amplitude were higher. The test-retest variability of phase coherence was comparable to that of EFR amplitude in terms of the coefficient of variation, and the repeatability coefficient varied from 0.1 to 0.2, with the highest value of 0.2 for /∫/. Mismatches in EFR detection outcomes occurred in 11 of 176 measurements. For each stimulus, the tests of proportions revealed a significantly higher proportion of matched detection outcomes compared to mismatches.

CONCLUSIONS: Speech-evoked EFRs demonstrated reasonable repeatability across sessions. Of the eight stimuli, the shortest stimulus /∫/ demonstrated the largest variability in EFR amplitude and phase coherence. The test-retest variability in EFR amplitude could not be explained by test-retest differences in noise amplitude for any of the stimuli. This lack of explanation argues for other sources of variability, one possibility being the modulation of cortical contributions imposed on brainstem-generated EFRs.

RevDate: 2019-12-10
CmpDate: 2019-11-29

Zhao TC, Masapollo M, Polka L, et al (2019)

Effects of formant proximity and stimulus prototypicality on the neural discrimination of vowels: Evidence from the auditory frequency-following response.

Brain and language, 194:77-83.

Cross-language speech perception experiments indicate that for many vowel contrasts, discrimination is easier when the same pair of vowels is presented in one direction compared to the reverse direction. According to one account, these directional asymmetries reflect a universal bias favoring "focal" vowels (i.e., vowels with prominent spectral peaks formed by the convergence of adjacent formants). An alternative account is that such effects reflect an experience-dependent bias favoring prototypical exemplars of native-language vowel categories. Here, we tested the predictions of these accounts by recording the auditory frequency-following response in English-speaking listeners to two synthetic variants of the vowel /u/ that differed in the proximity of their first and second formants and prototypicality, with stimuli arranged in oddball and reversed-oddball blocks. Participants showed evidence of neural discrimination when the more-focal/less-prototypic /u/ served as the deviant stimulus, but not when the less-focal/more-prototypic /u/ served as the deviant, consistent with the focalization account.

RevDate: 2019-06-22

König A, Linz N, Zeghari R, et al (2019)

Detecting Apathy in Older Adults with Cognitive Disorders Using Automatic Speech Analysis.

Journal of Alzheimer's disease : JAD, 69(4):1183-1193.

BACKGROUND: Apathy is present in several psychiatric and neurological conditions and has been found to have a severe negative effect on disease progression. In older people, it can be a predictor of increased dementia risk. Current assessment methods lack objectivity and sensitivity, thus new diagnostic tools and broad-scale screening technologies are needed.

OBJECTIVE: This study is the first of its kind aiming to investigate whether automatic speech analysis could be used for characterization and detection of apathy.

METHODS: A group of apathetic and non-apathetic patients (n = 60) with mild to moderate neurocognitive disorder were recorded while performing two short narrative speech tasks. Paralinguistic markers relating to prosodic, formant, source, and temporal qualities of speech were automatically extracted, examined between the groups and compared to baseline assessments. Machine learning experiments were carried out to validate the diagnostic power of extracted markers.

RESULTS: Correlations between apathy sub-scales and features revealed a relation between temporal aspects of speech and the subdomains of reduction in interest and initiative, as well as between prosody features and the affective domain. Group differences were found to vary for males and females, depending on the task. Differences in temporal aspects of speech were found to be the most consistent difference between apathetic and non-apathetic patients. Machine learning models trained on speech features achieved top performances of AUC = 0.88 for males and AUC = 0.77 for females.

CONCLUSIONS: These findings reinforce the usability of speech as a reliable biomarker in the detection and assessment of apathy.

RevDate: 2019-11-20

Cox SR, Raphael LJ, PC Doyle (2019)

Production of Vowels by Electrolaryngeal Speakers Using Clear Speech.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000499928 [Epub ahead of print].

BACKGROUND/AIMS: This study examined the effect of clear speech on vowel productions by electrolaryngeal speakers.

METHOD: Ten electrolaryngeal speakers produced eighteen words containing /i/, /ɪ/, /ɛ/, /æ/, /eɪ/, and /oʊ/ using habitual speech and clear speech. Twelve listeners transcribed 360 words, and a total of 4,320 vowel stimuli across speaking conditions, speakers, and listeners were analyzed. Analyses included listeners' identifications of vowels, vowel duration, and vowel formant relationships.

RESULTS: No significant effect of speaking condition was found on vowel identification. Specifically, 85.4% of the vowels were identified in habitual speech, and 82.7% of the vowels were identified in clear speech. However, clear speech was found to have a significant effect on vowel durations. The mean vowel duration in the 17 consonant-vowel-consonant words was 333 ms in habitual speech and 354 ms in clear speech. The mean vowel duration in the single consonant-vowel words was 551 ms in habitual speech and 629 ms in clear speech.

CONCLUSION: Finding suggests that, although clear speech facilitates longer vowel durations, electrolaryngeal speakers may not gain a clear speech benefit relative to listeners' vowel identifications.

RevDate: 2019-11-12

Alharbi GG, Cannito MP, Buder EH, et al (2019)

Spectral/Cepstral Analyses of Phonation in Parkinson's Disease before and after Voice Treatment: A Preliminary Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 71(5-6):275-285.

PURPOSE: This article examines cepstral/spectral analyses of sustained /α/ vowels produced by speakers with hypokinetic dysarthria secondary to idiopathic Parkinson's disease (PD) before and after Lee Silverman Voice Treatment (LSVT®LOUD) and the relationship of these measures with overall voice intensity.

METHODOLOGY: Nine speakers with PD were examined in a pre-/post-treatment design, with multiple daily audio recordings before and after treatment. Sustained vowels were analyzed for cepstral peak prominence (CPP), CPP standard deviation (CPP SD), low/high spectral ratio (L/H SR), and Cepstral/Spectral Index of Dysphonia (CSID) using the KAYPENTAX computer software.

RESULTS: CPP and CPP SD increased significantly and CSID decreased significantly from pre- to post-treatment recordings, with strong effect sizes. Increased CPP indicates increased dominance of harmonics in the spectrum following LSVT. After restricting the frequency cutoff to the region just above the first formant and second formant and below the third formant, L/H SR was observed to decrease significantly following treatment. Correlation analyses demonstrated that CPP was more strongly associated with CSID before treatment than after.

CONCLUSION: In addition to increased vocal intensity following LSVT, speakers with PD exhibited both improved harmonic structure and voice quality as reflected by cepstral/spectral analysis, indicating that there was improved harmonic structure and reduced dysphonia following treatment.

RevDate: 2020-01-08

Zaltz Y, Goldsworthy RL, Eisenberg LS, et al (2020)

Children With Normal Hearing Are Efficient Users of Fundamental Frequency and Vocal Tract Length Cues for Voice Discrimination.

Ear and hearing, 41(1):182-193.

BACKGROUND: The ability to discriminate between talkers assists listeners in understanding speech in a multitalker environment. This ability has been shown to be influenced by sensory processing of vocal acoustic cues, such as fundamental frequency (F0) and formant frequencies that reflect the listener's vocal tract length (VTL), and by cognitive processes, such as attention and memory. It is, therefore, suggested that children who exhibit immature sensory and/or cognitive processing will demonstrate poor voice discrimination (VD) compared with young adults. Moreover, greater difficulties in VD may be associated with spectral degradation as in children with cochlear implants.

OBJECTIVES: The aim of this study was as follows: (1) to assess the use of F0 cues, VTL cues, and the combination of both cues for VD in normal-hearing (NH) school-age children and to compare their performance with that of NH adults; (2) to assess the influence of spectral degradation by means of vocoded speech on the use of F0 and VTL cues for VD in NH children; and (3) to assess the contribution of attention, working memory, and nonverbal reasoning to performance.

DESIGN: Forty-one children, 8 to 11 years of age, were tested with nonvocoded stimuli. Twenty-one of them were also tested with eight-channel, noise-vocoded stimuli. Twenty-one young adults (18 to 35 years) were tested for comparison. A three-interval, three-alternative forced-choice paradigm with an adaptive tracking procedure was used to estimate the difference limens (DLs) for VD when F0, VTL, and F0 + VTL were manipulated separately. Auditory memory, visual attention, and nonverbal reasoning were assessed for all participants.

RESULTS: (a) Children' F0 and VTL discrimination abilities were comparable to those of adults, suggesting that most school-age children utilize both cues effectively for VD. (b) Children's VD was associated with trail making test scores that assessed visual attention abilities and speed of processing, possibly reflecting their need to recruit cognitive resources for the task. (c) Best DLs were achieved for the combined (F0 + VTL) manipulation for both children and adults, suggesting that children at this age are already capable of integrating spectral and temporal cues. (d) Both children and adults found the VTL manipulations more beneficial for VD compared with the F0 manipulations, suggesting that formant frequencies are more reliable for identifying a specific speaker than F0. (e) Poorer DLs were achieved with the vocoded stimuli, though the children maintained similar thresholds and pattern of performance among manipulations as the adults.

CONCLUSIONS: The present study is the first to assess the contribution of F0, VTL, and the combined F0 + VTL to the discrimination of speakers in school-age children. The findings support the notion that many NH school-age children have effective spectral and temporal coding mechanisms that allow sufficient VD, even in the presence of spectrally degraded information. These results may challenge the notion that immature sensory processing underlies poor listening abilities in children, further implying that other processing mechanisms contribute to their difficulties to understand speech in a multitalker environment. These outcomes may also provide insight into VD processes of children under listening conditions that are similar to cochlear implant users.

RevDate: 2020-01-15

Auracher J, Scharinger M, W Menninghaus (2019)

Contiguity-based sound iconicity: The meaning of words resonates with phonetic properties of their immediate verbal contexts.

PloS one, 14(5):e0216930 pii:PONE-D-18-29313.

We tested the hypothesis that phonosemantic iconicity--i.e., a motivated resonance of sound and meaning--might not only be found on the level of individual words or entire texts, but also in word combinations such that the meaning of a target word is iconically expressed, or highlighted, in the phonetic properties of its immediate verbal context. To this end, we extracted single lines from German poems that all include a word designating high or low dominance, such as large or small, strong or weak, etc. Based on insights from previous studies, we expected to find more vowels with a relatively short distance between the first two formants (low formant dispersion) in the immediate context of words expressing high physical or social dominance than in the context of words expressing low dominance. Our findings support this hypothesis, suggesting that neighboring words can form iconic dyads in which the meaning of one word is sound-iconically reflected in the phonetic properties of adjacent words. The construct of a contiguity-based phono-semantic iconicity opens many venues for future research well beyond lines extracted from poems.

RevDate: 2020-01-03

Koenig LL, S Fuchs (2019)

Vowel Formants in Normal and Loud Speech.

Journal of speech, language, and hearing research : JSLHR, 62(5):1278-1295.

Purpose This study evaluated how 1st and 2nd vowel formant frequencies (F1, F2) differ between normal and loud speech in multiple speaking tasks to assess claims that loudness leads to exaggerated vowel articulation. Method Eleven healthy German-speaking women produced normal and loud speech in 3 tasks that varied in the degree of spontaneity: reading sentences that contained isolated /i: a: u:/, responding to questions that included target words with controlled consonantal contexts but varying vowel qualities, and a recipe recall task. Loudness variation was elicited naturalistically by changing interlocutor distance. First and 2nd formant frequencies and average sound pressure level were obtained from the stressed vowels in the target words, and vowel space area was calculated from /i: a: u:/. Results Comparisons across many vowels indicated that high, tense vowels showed limited formant variation as a function of loudness. Analysis of /i: a: u:/ across speech tasks revealed vowel space reduction in the recipe retell task compared to the other 2. Loudness changes for F1 were consistent in direction but variable in extent, with few significant results for high tense vowels. Results for F2 were quite varied and frequently not significant. Speakers differed in how loudness and task affected formant values. Finally, correlations between sound pressure level and F1 were generally positive but varied in magnitude across vowels, with the high tense vowels showing very flat slopes. Discussion These data indicate that naturalistically elicited loud speech in typical speakers does not always lead to changes in vowel formant frequencies and call into question the notion that increasing loudness is necessarily an automatic method of expanding the vowel space. Supplemental Material https://doi.org/10.23641/asha.8061740.

RevDate: 2020-01-03

Nalborczyk L, Batailler C, Lœvenbruck H, et al (2019)

An Introduction to Bayesian Multilevel Models Using brms: A Case Study of Gender Effects on Vowel Variability in Standard Indonesian.

Journal of speech, language, and hearing research : JSLHR, 62(5):1225-1242.

Purpose Bayesian multilevel models are increasingly used to overcome the limitations of frequentist approaches in the analysis of complex structured data. This tutorial introduces Bayesian multilevel modeling for the specific analysis of speech data, using the brms package developed in R. Method In this tutorial, we provide a practical introduction to Bayesian multilevel modeling by reanalyzing a phonetic data set containing formant (F1 and F2) values for 5 vowels of standard Indonesian (ISO 639-3:ind), as spoken by 8 speakers (4 females and 4 males), with several repetitions of each vowel. Results We first give an introductory overview of the Bayesian framework and multilevel modeling. We then show how Bayesian multilevel models can be fitted using the probabilistic programming language Stan and the R package brms, which provides an intuitive formula syntax. Conclusions Through this tutorial, we demonstrate some of the advantages of the Bayesian framework for statistical modeling and provide a detailed case study, with complete source code for full reproducibility of the analyses (https://osf.io/dpzcb /). Supplemental Material https://doi.org/10.23641/asha.7973822.

RevDate: 2019-12-27
CmpDate: 2019-12-27

van Rij J, Hendriks P, van Rijn H, et al (2019)

Analyzing the Time Course of Pupillometric Data.

Trends in hearing, 23:2331216519832483.

This article provides a tutorial for analyzing pupillometric data. Pupil dilation has become increasingly popular in psychological and psycholinguistic research as a measure to trace language processing. However, there is no general consensus about procedures to analyze the data, with most studies analyzing extracted features from the pupil dilation data instead of analyzing the pupil dilation trajectories directly. Recent studies have started to apply nonlinear regression and other methods to analyze the pupil dilation trajectories directly, utilizing all available information in the continuously measured signal. This article applies a nonlinear regression analysis, generalized additive mixed modeling, and illustrates how to analyze the full-time course of the pupil dilation signal. The regression analysis is particularly suited for analyzing pupil dilation in the fields of psychological and psycholinguistic research because generalized additive mixed models can include complex nonlinear interactions for investigating the effects of properties of stimuli (e.g., formant frequency) or participants (e.g., working memory score) on the pupil dilation signal. To account for the variation due to participants and items, nonlinear random effects can be included. However, one of the challenges for analyzing time series data is dealing with the autocorrelation in the residuals, which is rather extreme for the pupillary signal. On the basis of simulations, we explain potential causes of this extreme autocorrelation, and on the basis of the experimental data, we show how to reduce their adverse effects, allowing a much more coherent interpretation of pupillary data than possible with feature-based techniques.

RevDate: 2019-11-20

He L, Zhang Y, V Dellwo (2019)

Between-speaker variability and temporal organization of the first formant.

The Journal of the Acoustical Society of America, 145(3):EL209.

First formant (F1) trajectories of vocalic intervals were divided into positive and negative dynamics. Positive F1 dynamics were defined as the speeds of F1 increases to reach the maxima, and negative F1 dynamics as the speeds of F1 decreases away from the maxima. Mean, standard deviation, and sequential variability were measured for both dynamics. Results showed that measures of negative F1 dynamics explained more between-speaker variability, which was highly congruent with a previous study using intensity dynamics [He and Dellwo (2017). J. Acoust. Soc. Am. 141, EL488-EL494]. The results may be explained by speaker idiosyncratic articulation.

RevDate: 2019-11-20

Roberts B, RJ Summers (2019)

Dichotic integration of acoustic-phonetic information: Competition from extraneous formants increases the effect of second-formant attenuation on intelligibility.

The Journal of the Acoustical Society of America, 145(3):1230.

Differences in ear of presentation and level do not prevent effective integration of concurrent speech cues such as formant frequencies. For example, presenting the higher formants of a consonant-vowel syllable in the opposite ear to the first formant protects them from upward spread of masking, allowing them to remain effective speech cues even after substantial attenuation. This study used three-formant (F1+F2+F3) analogues of natural sentences and extended the approach to include competitive conditions. Target formants were presented dichotically (F1+F3; F2), either alone or accompanied by an extraneous competitor for F2 (i.e., F1±F2C+F3; F2) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour and using the F2 amplitude contour without attenuation. In experiment 1, F2C was always absent and intelligibility was unaffected until F2 attenuation exceeded 30 dB; F2 still provided useful information at 48-dB attenuation. In experiment 2, attenuating F2 by 24 dB caused considerable loss of intelligibility when F2C was present, but had no effect in its absence. Factors likely to contribute to this interaction include informational masking from F2C acting to swamp the acoustic-phonetic information carried by F2, and interaural inhibition from F2C acting to reduce the effective level of F2.

RevDate: 2019-11-20

Ogata K, Kodama T, Hayakawa T, et al (2019)

Inverse estimation of the vocal tract shape based on a vocal tract mapping interface.

The Journal of the Acoustical Society of America, 145(4):1961.

This paper describes the inverse estimation of the vocal tract shape for vowels by using a vocal tract mapping interface. In prior research, an interface capable of generating a vocal tract shape by clicking on its window was developed. The vocal tract shapes for five vowels are located at the vertices of a pentagonal chart and a different shape that corresponds to an arbitrary mouse-pointer position on the interface window is calculated by interpolation. In this study, an attempt was made to apply the interface to the inverse estimation of vocal tract shapes based on formant frequencies. A target formant frequency data set was searched based on the geometry of the interface window by using a coarse to fine algorithm. It was revealed that the estimated vocal tract shapes obtained from the mapping interface were close to those from magnetic resonance imaging data in another study and to lip area data captured using video recordings. The results of experiments to evaluate the estimated vocal tract shapes showed that each subject demonstrated unique trajectories on the interface window corresponding to the estimated vocal tract shapes. These results suggest the usefulness of inverse estimation using the interface.

RevDate: 2019-11-20

Thompson A, Y Kim (2019)

Relation of second formant trajectories to tongue kinematics.

The Journal of the Acoustical Society of America, 145(4):EL323.

In this study, the relationship between the acoustic and articulatory kinematic domains of speech was examined among nine neurologically healthy female speakers using two derived relationships between tongue kinematics and F2 measurements: (1) second formant frequency (F2) extent to lingual displacement and (2) F2 slope to lingual speed. Additionally, the relationships between these paired parameters were examined within conversational, more clear, and less clear speaking modes. In general, the findings of the study support a strong correlation for both sets of paired parameters. In addition, the data showed significant changes in articulatory behaviors across speaking modes including the magnitude of tongue motion, but not in the speed-related measures.

RevDate: 2019-11-20

Bürki A, Welby P, Clément M, et al (2019)

Orthography and second language word learning: Moving beyond "friend or foe?".

The Journal of the Acoustical Society of America, 145(4):EL265.

French participants learned English pseudowords either with the orthographic form displayed under the corresponding picture (Audio-Ortho) or without (Audio). In a naming task, pseudowords learned in the Audio-Ortho condition were produced faster and with fewer errors, providing a first piece of evidence that orthographic information facilitates the learning and on-line retrieval of productive vocabulary in a second language. Formant analyses, however, showed that productions from the Audio-Ortho condition were more French-like (i.e., less target-like), a result confirmed by a vowel categorization task performed by native speakers of English. It is argued that novel word learning and pronunciation accuracy should be considered together.

RevDate: 2019-11-20

Colby S, Shiller DM, Clayards M, et al (2019)

Different Responses to Altered Auditory Feedback in Younger and Older Adults Reflect Differences in Lexical Bias.

Journal of speech, language, and hearing research : JSLHR, 62(4S):1144-1151.

Purpose Previous work has found that both young and older adults exhibit a lexical bias in categorizing speech stimuli. In young adults, this has been argued to be an automatic influence of the lexicon on perceptual category boundaries. Older adults exhibit more top-down biases than younger adults, including an increased lexical bias. We investigated the nature of the increased lexical bias using a sensorimotor adaptation task designed to evaluate whether automatic processes drive this bias in older adults. Method A group of older adults (n = 27) and younger adults (n = 35) participated in an altered auditory feedback production task. Participants produced target words and nonwords under altered feedback that affected the 1st formant of the vowel. There were 2 feedback conditions that affected the lexical status of the target, such that target words were shifted to sound more like nonwords (e.g., less-liss) and target nonwords to sound more like words (e.g., kess-kiss). Results A mixed-effects linear regression was used to investigate the magnitude of compensation to altered auditory feedback between age groups and lexical conditions. Over the course of the experiment, older adults compensated (by shifting their production of 1st formant) more to altered auditory feedback when producing words that were shifted toward nonwords (less-liss) than when producing nonwords that were shifted toward words (kess-kiss). This is in contrast to younger adults who compensated more to nonwords that were shifted toward words compared to words that were shifted toward nonwords. Conclusion We found no evidence that the increased lexical bias previously observed in older adults is driven by a greater sensitivity to top-down lexical influence on perceptual category boundaries. We suggest the increased lexical bias in older adults is driven by postperceptual processes that arise as a result of age-related cognitive and sensory changes.

RevDate: 2019-11-20

Schertz J, Carbonell K, AJ Lotto (2019)

Language Specificity in Phonetic Cue Weighting: Monolingual and Bilingual Perception of the Stop Voicing Contrast in English and Spanish.

Phonetica pii:000497278 [Epub ahead of print].

BACKGROUND/AIMS: This work examines the perception of the stop voicing contrast in Spanish and English along four acoustic dimensions, comparing monolingual and bilingual listeners. Our primary goals are to test the extent to which cue-weighting strategies are language-specific in monolinguals, and whether this language specificity extends to bilingual listeners.

METHODS: Participants categorized sounds varying in voice onset time (VOT, the primary cue to the contrast) and three secondary cues: fundamental frequency at vowel onset, first formant (F1) onset frequency, and stop closure duration. Listeners heard acoustically identical target stimuli, within language-specific carrier phrases, in English and Spanish modes.

RESULTS: While all listener groups used all cues, monolingual English listeners relied more on F1, and less on closure duration, than monolingual Spanish listeners, indicating language specificity in cue use. Early bilingual listeners used the three secondary cues similarly in English and Spanish, despite showing language-specific VOT boundaries.

CONCLUSION: While our findings reinforce previous work demonstrating language-specific phonetic representations in bilinguals in terms of VOT boundary, they suggest that this specificity may not extend straightforwardly to cue-weighting strategies.

RevDate: 2019-11-20

Kulikov V (2019)

Laryngeal Contrast in Qatari Arabic: Effect of Speaking Rate on Voice Onset Time.

Phonetica pii:000497277 [Epub ahead of print].

Beckman and colleagues claimed in 2011 that Swedish has an overspecified phonological contrast between prevoiced and voiceless aspirated stops. Yet, Swedish is the only language for which this pattern has been reported. The current study describes a similar phonological pattern in the vernacular Arabic dialect of Qatar. Acoustic measurements of main (voice onset time, VOT) and secondary (fundamental frequency, first formant) cues to voicing are based on production data of 8 native speakers of Qatari Arabic, who pronounced 1,380 voiced and voiceless word-initial stops in the slow and fast rate conditions. The results suggest that the VOT pattern found in voiced Qatari Arabic stops b, d, g is consistent with prevoicing in voice languages like Dutch, Russian, or Swedish. The pattern found in voiceless stops t, k is consistent with aspiration in aspirating languages like English, German, or Swedish. Similar to Swedish, both prevoicing and aspiration in Qatari Arabic stops change in response to speaking rate. VOT significantly increased by 19 ms in prevoiced stops and by 12 ms in voiceless stops in the slow speaking rate condition. The findings suggest that phonological overspecification in laryngeal contrasts may not be an uncommon pattern among languages.

RevDate: 2020-01-08

Hedrick M, Thornton KET, Yeager K, et al (2020)

The Use of Static and Dynamic Cues for Vowel Identification by Children Wearing Hearing Aids or Cochlear Implants.

Ear and hearing, 41(1):72-81.

OBJECTIVE: To examine vowel perception based on dynamic formant transition and/or static formant pattern cues in children with hearing loss while using their hearing aids or cochlear implants. We predicted that the sensorineural hearing loss would degrade formant transitions more than static formant patterns, and that shortening the duration of cues would cause more difficulty for vowel identification for these children than for their normal-hearing peers.

DESIGN: A repeated-measures, between-group design was used. Children 4 to 9 years of age from a university hearing services clinic who were fit for hearing aids (13 children) or who wore cochlear implants (10 children) participated. Chronologically age-matched children with normal hearing served as controls (23 children). Stimuli included three naturally produced syllables (/ba/, /bi/, and /bu/), which were presented either in their entirety or segmented to isolate the formant transition or the vowel static formant center. The stimuli were presented to listeners via loudspeaker in the sound field. Aided participants wore their own devices and listened with their everyday settings. Participants chose the vowel presented by selecting from corresponding pictures on a computer screen.

RESULTS: Children with hearing loss were less able to use shortened transition or shortened vowel centers to identify vowels as compared to their normal-hearing peers. Whole syllable and initial transition yielded better identification performance than the vowel center for /ɑ/, but not for /i/ or /u/.

CONCLUSIONS: The children with hearing loss may require a longer time window than children with normal hearing to integrate vowel cues over time because of altered peripheral encoding in spectrotemporal domains. Clinical implications include cognizance of the importance of vowel perception when developing habilitative programs for children with hearing loss.

RevDate: 2020-01-03

Lowenstein JH, S Nittrouer (2019)

Perception-Production Links in Children's Speech.

Journal of speech, language, and hearing research : JSLHR, 62(4):853-867.

Purpose Child phonologists have long been interested in how tightly speech input constrains the speech production capacities of young children, and the question acquires clinical significance when children with hearing loss are considered. Children with sensorineural hearing loss often show differences in the spectral and temporal structures of their speech production, compared to children with normal hearing. The current study was designed to investigate the extent to which this problem can be explained by signal degradation. Method Ten 5-year-olds with normal hearing were recorded imitating 120 three-syllable nonwords presented in unprocessed form and as noise-vocoded signals. Target segments consisted of fricatives, stops, and vowels. Several measures were made: 2 duration measures (voice onset time and fricative length) and 4 spectral measures involving 2 segments (1st and 3rd moments of fricatives and 1st and 2nd formant frequencies for the point vowels). Results All spectral measures were affected by signal degradation, with vowel production showing the largest effects. Although a change in voice onset time was observed with vocoded signals for /d/, voicing category was not affected. Fricative duration remained constant. Conclusions Results support the hypothesis that quality of the input signal constrains the speech production capacities of young children. Consequently, it can be concluded that the production problems of children with hearing loss-including those with cochlear implants-can be explained to some extent by the degradation in the signal they hear. However, experience with both speech perception and production likely plays a role as well.

RevDate: 2019-11-20

Croake DJ, Andreatta RD, JC Stemple (2019)

Descriptive Analysis of the Interactive Patterning of the Vocalization Subsystems in Healthy Participants: A Dynamic Systems Perspective.

Journal of speech, language, and hearing research : JSLHR, 62(2):215-228.

Purpose Normative data for many objective voice measures are routinely used in clinical voice assessment; however, normative data reflect vocal output, but not vocalization process. The underlying physiologic processes of healthy phonation have been shown to be nonlinear and thus are likely different across individuals. Dynamic systems theory postulates that performance behaviors emerge from the nonlinear interplay of multiple physiologic components and that certain patterns are preferred and loosely governed by the interactions of physiology, task, and environment. The purpose of this study was to descriptively characterize the interactive nature of the vocalization subsystem triad in subjects with healthy voices and to determine if differing subgroups could be delineated to better understand how healthy voicing is physiologically generated. Method Respiratory kinematic, aerodynamic, and acoustic formant data were obtained from 29 individuals with healthy voices (21 female and eight male). Multivariate analyses were used to descriptively characterize the interactions among the subsystems that contributed to healthy voicing. Results Group data revealed representative measures of the 3 subsystems to be generally within the boundaries of established normative data. Despite this, 3 distinct clusters were delineated that represented 3 subgroups of individuals with differing subsystem patterning. Seven of the 9 measured variables in this study were found to be significantly different across at least 1 of the 3 subgroups indicating differing physiologic processes across individuals. Conclusion Vocal output in healthy individuals appears to be generated by distinct and preferred physiologic processes that were represented by 3 subgroups indicating that the process of vocalization is different among individuals, but not entirely idiosyncratic. Possibilities for these differences are explored using the framework of dynamic systems theory and the dynamics of emergent behaviors. A revised physiologic model of phonation that accounts for differences within and among the vocalization subsystems is described. Supplemental Material https://doi.org/10.23641/asha.7616462.

RevDate: 2019-11-14
CmpDate: 2019-11-14

Stilp CE, AA Assgari (2019)

Natural speech statistics shift phoneme categorization.

Attention, perception & psychophysics, 81(6):2037-2052.

All perception takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., more energy at lower first formant [F1] frequencies) differs from that of a later (target) sound (e.g., vowel with intermediate F1), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F1 /ɛ/). Historically, these studies used filters to force context sounds to possess desired spectral compositions. This approach is agnostic to the natural signal statistics of speech (inherent spectral compositions without any additional manipulations). The auditory system is thought to be attuned to such stimulus statistics, but this has gone untested. Here, vowel categorization was measured following unfiltered (already possessing the desired spectral composition) or filtered sentences (to match spectral characteristics of unfiltered sentences). Vowel categorization was biased in both cases, with larger biases as the spectral prominences in context sentences increased. This confirms sensitivity to natural signal statistics, extending spectral context effects in speech perception to more naturalistic listening conditions. Importantly, categorization biases were smaller and more variable following unfiltered sentences, raising important questions about how faithfully experiments using filtered contexts model everyday speech perception.

RevDate: 2019-12-17
CmpDate: 2019-12-16

Rodrigues S, Martins F, Silva S, et al (2019)

/l/ velarisation as a continuum.

PloS one, 14(3):e0213392 pii:PONE-D-18-30510.

In this paper, we present a production study to explore the controversial question about /l/ velarisation. Measurements of first (F1), second (F2) and third (F3) formant frequencies and the slope of F2 were analysed to clarify the /l/ velarisation behaviour in European Portuguese (EP). The acoustic data were collected from ten EP speakers, producing trisyllabic words with paroxytone stress pattern, with the liquid consonant at the middle of the word in onset, complex onset and coda positions. Results suggested that /l/ is produced on a continuum in EP. The consistently low F2 indicates that /l/ is velarised in all syllable positions, but variation especially in F1 and F3 revealed that /l/ could be "more velarised" or "less velarised" dependent on syllable positions and vowel contexts. These findings suggest that it is important to consider different acoustic measures to better understand /l/ velarisation in EP.

RevDate: 2019-11-20

Rampinini AC, Handjaras G, Leo A, et al (2019)

Formant Space Reconstruction From Brain Activity in Frontal and Temporal Regions Coding for Heard Vowels.

Frontiers in human neuroscience, 13:32.

Classical studies have isolated a distributed network of temporal and frontal areas engaged in the neural representation of speech perception and production. With modern literature arguing against unique roles for these cortical regions, different theories have favored either neural code-sharing or cortical space-sharing, thus trying to explain the intertwined spatial and functional organization of motor and acoustic components across the fronto-temporal cortical network. In this context, the focus of attention has recently shifted toward specific model fitting, aimed at motor and/or acoustic space reconstruction in brain activity within the language network. Here, we tested a model based on acoustic properties (formants), and one based on motor properties (articulation parameters), where model-free decoding of evoked fMRI activity during perception, imagery, and production of vowels had been successful. Results revealed that phonological information organizes around formant structure during the perception of vowels; interestingly, such a model was reconstructed in a broad temporal region, outside of the primary auditory cortex, but also in the pars triangularis of the left inferior frontal gyrus. Conversely, articulatory features were not associated with brain activity in these regions. Overall, our results call for a degree of interdependence based on acoustic information, between the frontal and temporal ends of the language network.

RevDate: 2019-08-31

Franken MK, Acheson DJ, McQueen JM, et al (2019)

Consistency influences altered auditory feedback processing.

Quarterly journal of experimental psychology (2006), 72(10):2371-2379.

Previous research on the effect of perturbed auditory feedback in speech production has focused on two types of responses. In the short term, speakers generate compensatory motor commands in response to unexpected perturbations. In the longer term, speakers adapt feedforward motor programmes in response to feedback perturbations, to avoid future errors. The current study investigated the relation between these two types of responses to altered auditory feedback. Specifically, it was hypothesised that consistency in previous feedback perturbations would influence whether speakers adapt their feedforward motor programmes. In an altered auditory feedback paradigm, formant perturbations were applied either across all trials (the consistent condition) or only to some trials, whereas the others remained unperturbed (the inconsistent condition). The results showed that speakers' responses were affected by feedback consistency, with stronger speech changes in the consistent condition compared with the inconsistent condition. Current models of speech-motor control can explain this consistency effect. However, the data also suggest that compensation and adaptation are distinct processes, which are not in line with all current models.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Order from Amazon

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )