About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

26 May 2019 at 01:32
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 26 May 2019 at 01:32 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

RevDate: 2019-05-23

Cox SR, Raphael LJ, PC Doyle (2019)

Production of Vowels by Electrolaryngeal Speakers Using Clear Speech.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000499928 [Epub ahead of print].

BACKGROUND/AIMS: This study examined the effect of clear speech on vowel productions by electrolaryngeal speakers.

METHOD: Ten electrolaryngeal speakers produced eighteen words containing /i/, /ɪ/, /ɛ/, /æ/, /eɪ/, and /oʊ/ using habitual speech and clear speech. Twelve listeners transcribed 360 words, and a total of 4,320 vowel stimuli across speaking conditions, speakers, and listeners were analyzed. Analyses included listeners' identifications of vowels, vowel duration, and vowel formant relationships.

RESULTS: No significant effect of speaking condition was found on vowel identification. Specifically, 85.4% of the vowels were identified in habitual speech, and 82.7% of the vowels were identified in clear speech. However, clear speech was found to have a significant effect on vowel durations. The mean vowel duration in the 17 consonant-vowel-consonant words was 333 ms in habitual speech and 354 ms in clear speech. The mean vowel duration in the single consonant-vowel words was 551 ms in habitual speech and 629 ms in clear speech.

CONCLUSION: Finding suggests that, although clear speech facilitates longer vowel durations, electrolaryngeal speakers may not gain a clear speech benefit relative to listeners' vowel identifications.

RevDate: 2019-05-22

Alharbi GG, Cannito MP, Buder EH, et al (2019)

Spectral/Cepstral Analyses of Phonation in Parkinson's Disease before and after Voice Treatment: A Preliminary Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000495837 [Epub ahead of print].

PURPOSE: This article examines cepstral/spectral analyses of sustained /α/ vowels produced by speakers with hypokinetic dysarthria secondary to idiopathic Parkinson's disease (PD) before and after Lee Silverman Voice Treatment (LSVT®LOUD) and the relationship of these measures with overall voice intensity.

METHODOLOGY: Nine speakers with PD were examined in a pre-/post-treatment design, with multiple daily audio recordings before and after treatment. Sustained vowels were analyzed for cepstral peak prominence (CPP), CPP standard deviation (CPP SD), low/high spectral ratio (L/H SR), and Cepstral/Spectral Index of Dysphonia (CSID) using the KAYPENTAX computer software.

RESULTS: CPP and CPP SD increased significantly and CSID decreased significantly from pre- to post-treatment recordings, with strong effect sizes. Increased CPP indicates increased dominance of harmonics in the spectrum following LSVT. After restricting the frequency cutoff to the region just above the first formant and second formant and below the third formant, L/H SR was observed to decrease significantly following treatment. Correlation analyses demonstrated that CPP was more strongly associated with CSID before treatment than after.

CONCLUSION: In addition to increased vocal intensity following LSVT, speakers with PD exhibited both improved harmonic structure and voice quality as reflected by cepstral/spectral analysis, indicating that there was improved harmonic structure and reduced dysphonia following treatment.

RevDate: 2019-05-20

Zaltz Y, Goldsworthy RL, Eisenberg LS, et al (2019)

Children With Normal Hearing Are Efficient Users of Fundamental Frequency and Vocal Tract Length Cues for Voice Discrimination.

Ear and hearing [Epub ahead of print].

BACKGROUND: The ability to discriminate between talkers assists listeners in understanding speech in a multitalker environment. This ability has been shown to be influenced by sensory processing of vocal acoustic cues, such as fundamental frequency (F0) and formant frequencies that reflect the listener's vocal tract length (VTL), and by cognitive processes, such as attention and memory. It is, therefore, suggested that children who exhibit immature sensory and/or cognitive processing will demonstrate poor voice discrimination (VD) compared with young adults. Moreover, greater difficulties in VD may be associated with spectral degradation as in children with cochlear implants.

OBJECTIVES: The aim of this study was as follows: (1) to assess the use of F0 cues, VTL cues, and the combination of both cues for VD in normal-hearing (NH) school-age children and to compare their performance with that of NH adults; (2) to assess the influence of spectral degradation by means of vocoded speech on the use of F0 and VTL cues for VD in NH children; and (3) to assess the contribution of attention, working memory, and nonverbal reasoning to performance.

DESIGN: Forty-one children, 8 to 11 years of age, were tested with nonvocoded stimuli. Twenty-one of them were also tested with eight-channel, noise-vocoded stimuli. Twenty-one young adults (18 to 35 years) were tested for comparison. A three-interval, three-alternative forced-choice paradigm with an adaptive tracking procedure was used to estimate the difference limens (DLs) for VD when F0, VTL, and F0 + VTL were manipulated separately. Auditory memory, visual attention, and nonverbal reasoning were assessed for all participants.

RESULTS: (a) Children' F0 and VTL discrimination abilities were comparable to those of adults, suggesting that most school-age children utilize both cues effectively for VD. (b) Children's VD was associated with trail making test scores that assessed visual attention abilities and speed of processing, possibly reflecting their need to recruit cognitive resources for the task. (c) Best DLs were achieved for the combined (F0 + VTL) manipulation for both children and adults, suggesting that children at this age are already capable of integrating spectral and temporal cues. (d) Both children and adults found the VTL manipulations more beneficial for VD compared with the F0 manipulations, suggesting that formant frequencies are more reliable for identifying a specific speaker than F0. (e) Poorer DLs were achieved with the vocoded stimuli, though the children maintained similar thresholds and pattern of performance among manipulations as the adults.

CONCLUSIONS: The present study is the first to assess the contribution of F0, VTL, and the combined F0 + VTL to the discrimination of speakers in school-age children. The findings support the notion that many NH school-age children have effective spectral and temporal coding mechanisms that allow sufficient VD, even in the presence of spectrally degraded information. These results may challenge the notion that immature sensory processing underlies poor listening abilities in children, further implying that other processing mechanisms contribute to their difficulties to understand speech in a multitalker environment. These outcomes may also provide insight into VD processes of children under listening conditions that are similar to cochlear implant users.

RevDate: 2019-05-16

Auracher J, Scharinger M, W Menninghaus (2019)

Contiguity-based sound iconicity: The meaning of words resonates with phonetic properties of their immediate verbal contexts.

PloS one, 14(5):e0216930 pii:PONE-D-18-29313.

We tested the hypothesis that phonosemantic iconicity--i.e., a motivated resonance of sound and meaning--might not only be found on the level of individual words or entire texts, but also in word combinations such that the meaning of a target word is iconically expressed, or highlighted, in the phonetic properties of its immediate verbal context. To this end, we extracted single lines from German poems that all include a word designating high or low dominance, such as large or small, strong or weak, etc. Based on insights from previous studies, we expected to find more vowels with a relatively short distance between the first two formants (low formant dispersion) in the immediate context of words expressing high physical or social dominance than in the context of words expressing low dominance. Our findings support this hypothesis, suggesting that neighboring words can form iconic dyads in which the meaning of one word is sound-iconically reflected in the phonetic properties of adjacent words. The construct of a contiguity-based phono-semantic iconicity opens many venues for future research well beyond lines extracted from poems.

RevDate: 2019-05-15

Koenig LL, S Fuchs (2019)

Vowel Formants in Normal and Loud Speech.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose This study evaluated how 1st and 2nd vowel formant frequencies (F1, F2) differ between normal and loud speech in multiple speaking tasks to assess claims that loudness leads to exaggerated vowel articulation. Method Eleven healthy German-speaking women produced normal and loud speech in 3 tasks that varied in the degree of spontaneity: reading sentences that contained isolated /i: a: u:/, responding to questions that included target words with controlled consonantal contexts but varying vowel qualities, and a recipe recall task. Loudness variation was elicited naturalistically by changing interlocutor distance. First and 2nd formant frequencies and average sound pressure level were obtained from the stressed vowels in the target words, and vowel space area was calculated from /i: a: u:/. Results Comparisons across many vowels indicated that high, tense vowels showed limited formant variation as a function of loudness. Analysis of /i: a: u:/ across speech tasks revealed vowel space reduction in the recipe retell task compared to the other 2. Loudness changes for F1 were consistent in direction but variable in extent, with few significant results for high tense vowels. Results for F2 were quite varied and frequently not significant. Speakers differed in how loudness and task affected formant values. Finally, correlations between sound pressure level and F1 were generally positive but varied in magnitude across vowels, with the high tense vowels showing very flat slopes. Discussion These data indicate that naturalistically elicited loud speech in typical speakers does not always lead to changes in vowel formant frequencies and call into question the notion that increasing loudness is necessarily an automatic method of expanding the vowel space. Supplemental Material https://doi.org/10.23641/asha.8061740.

RevDate: 2019-05-14

Nalborczyk L, Batailler C, Lœvenbruck H, et al (2019)

An Introduction to Bayesian Multilevel Models Using brms: A Case Study of Gender Effects on Vowel Variability in Standard Indonesian.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Bayesian multilevel models are increasingly used to overcome the limitations of frequentist approaches in the analysis of complex structured data. This tutorial introduces Bayesian multilevel modeling for the specific analysis of speech data, using the brms package developed in R. Method In this tutorial, we provide a practical introduction to Bayesian multilevel modeling by reanalyzing a phonetic data set containing formant (F1 and F2) values for 5 vowels of standard Indonesian (ISO 639-3:ind), as spoken by 8 speakers (4 females and 4 males), with several repetitions of each vowel. Results We first give an introductory overview of the Bayesian framework and multilevel modeling. We then show how Bayesian multilevel models can be fitted using the probabilistic programming language Stan and the R package brms, which provides an intuitive formula syntax. Conclusions Through this tutorial, we demonstrate some of the advantages of the Bayesian framework for statistical modeling and provide a detailed case study, with complete source code for full reproducibility of the analyses (https://osf.io/dpzcb /). Supplemental Material https://doi.org/10.23641/asha.7973822.

RevDate: 2019-05-13

van Rij J, Hendriks P, van Rijn H, et al (2019)

Analyzing the Time Course of Pupillometric Data.

Trends in hearing, 23:2331216519832483.

This article provides a tutorial for analyzing pupillometric data. Pupil dilation has become increasingly popular in psychological and psycholinguistic research as a measure to trace language processing. However, there is no general consensus about procedures to analyze the data, with most studies analyzing extracted features from the pupil dilation data instead of analyzing the pupil dilation trajectories directly. Recent studies have started to apply nonlinear regression and other methods to analyze the pupil dilation trajectories directly, utilizing all available information in the continuously measured signal. This article applies a nonlinear regression analysis, generalized additive mixed modeling, and illustrates how to analyze the full-time course of the pupil dilation signal. The regression analysis is particularly suited for analyzing pupil dilation in the fields of psychological and psycholinguistic research because generalized additive mixed models can include complex nonlinear interactions for investigating the effects of properties of stimuli (e.g., formant frequency) or participants (e.g., working memory score) on the pupil dilation signal. To account for the variation due to participants and items, nonlinear random effects can be included. However, one of the challenges for analyzing time series data is dealing with the autocorrelation in the residuals, which is rather extreme for the pupillary signal. On the basis of simulations, we explain potential causes of this extreme autocorrelation, and on the basis of the experimental data, we show how to reduce their adverse effects, allowing a much more coherent interpretation of pupillary data than possible with feature-based techniques.

RevDate: 2019-05-09

He L, Zhang Y, V Dellwo (2019)

Between-speaker variability and temporal organization of the first formant.

The Journal of the Acoustical Society of America, 145(3):EL209.

First formant (F1) trajectories of vocalic intervals were divided into positive and negative dynamics. Positive F1 dynamics were defined as the speeds of F1 increases to reach the maxima, and negative F1 dynamics as the speeds of F1 decreases away from the maxima. Mean, standard deviation, and sequential variability were measured for both dynamics. Results showed that measures of negative F1 dynamics explained more between-speaker variability, which was highly congruent with a previous study using intensity dynamics [He and Dellwo (2017). J. Acoust. Soc. Am. 141, EL488-EL494]. The results may be explained by speaker idiosyncratic articulation.

RevDate: 2019-05-09

Roberts B, RJ Summers (2019)

Dichotic integration of acoustic-phonetic information: Competition from extraneous formants increases the effect of second-formant attenuation on intelligibility.

The Journal of the Acoustical Society of America, 145(3):1230.

Differences in ear of presentation and level do not prevent effective integration of concurrent speech cues such as formant frequencies. For example, presenting the higher formants of a consonant-vowel syllable in the opposite ear to the first formant protects them from upward spread of masking, allowing them to remain effective speech cues even after substantial attenuation. This study used three-formant (F1+F2+F3) analogues of natural sentences and extended the approach to include competitive conditions. Target formants were presented dichotically (F1+F3; F2), either alone or accompanied by an extraneous competitor for F2 (i.e., F1±F2C+F3; F2) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour and using the F2 amplitude contour without attenuation. In experiment 1, F2C was always absent and intelligibility was unaffected until F2 attenuation exceeded 30 dB; F2 still provided useful information at 48-dB attenuation. In experiment 2, attenuating F2 by 24 dB caused considerable loss of intelligibility when F2C was present, but had no effect in its absence. Factors likely to contribute to this interaction include informational masking from F2C acting to swamp the acoustic-phonetic information carried by F2, and interaural inhibition from F2C acting to reduce the effective level of F2.

RevDate: 2019-05-03

Ogata K, Kodama T, Hayakawa T, et al (2019)

Inverse estimation of the vocal tract shape based on a vocal tract mapping interface.

The Journal of the Acoustical Society of America, 145(4):1961.

This paper describes the inverse estimation of the vocal tract shape for vowels by using a vocal tract mapping interface. In prior research, an interface capable of generating a vocal tract shape by clicking on its window was developed. The vocal tract shapes for five vowels are located at the vertices of a pentagonal chart and a different shape that corresponds to an arbitrary mouse-pointer position on the interface window is calculated by interpolation. In this study, an attempt was made to apply the interface to the inverse estimation of vocal tract shapes based on formant frequencies. A target formant frequency data set was searched based on the geometry of the interface window by using a coarse to fine algorithm. It was revealed that the estimated vocal tract shapes obtained from the mapping interface were close to those from magnetic resonance imaging data in another study and to lip area data captured using video recordings. The results of experiments to evaluate the estimated vocal tract shapes showed that each subject demonstrated unique trajectories on the interface window corresponding to the estimated vocal tract shapes. These results suggest the usefulness of inverse estimation using the interface.

RevDate: 2019-05-03

Thompson A, Y Kim (2019)

Relation of second formant trajectories to tongue kinematics.

The Journal of the Acoustical Society of America, 145(4):EL323.

In this study, the relationship between the acoustic and articulatory kinematic domains of speech was examined among nine neurologically healthy female speakers using two derived relationships between tongue kinematics and F2 measurements: (1) second formant frequency (F2) extent to lingual displacement and (2) F2 slope to lingual speed. Additionally, the relationships between these paired parameters were examined within conversational, more clear, and less clear speaking modes. In general, the findings of the study support a strong correlation for both sets of paired parameters. In addition, the data showed significant changes in articulatory behaviors across speaking modes including the magnitude of tongue motion, but not in the speed-related measures.

RevDate: 2019-05-03

Bürki A, Welby P, Clément M, et al (2019)

Orthography and second language word learning: Moving beyond "friend or foe?".

The Journal of the Acoustical Society of America, 145(4):EL265.

French participants learned English pseudowords either with the orthographic form displayed under the corresponding picture (Audio-Ortho) or without (Audio). In a naming task, pseudowords learned in the Audio-Ortho condition were produced faster and with fewer errors, providing a first piece of evidence that orthographic information facilitates the learning and on-line retrieval of productive vocabulary in a second language. Formant analyses, however, showed that productions from the Audio-Ortho condition were more French-like (i.e., less target-like), a result confirmed by a vowel categorization task performed by native speakers of English. It is argued that novel word learning and pronunciation accuracy should be considered together.

RevDate: 2019-04-26

Colby S, Shiller DM, Clayards M, et al (2019)

Different Responses to Altered Auditory Feedback in Younger and Older Adults Reflect Differences in Lexical Bias.

Journal of speech, language, and hearing research : JSLHR, 62(4S):1144-1151.

Purpose Previous work has found that both young and older adults exhibit a lexical bias in categorizing speech stimuli. In young adults, this has been argued to be an automatic influence of the lexicon on perceptual category boundaries. Older adults exhibit more top-down biases than younger adults, including an increased lexical bias. We investigated the nature of the increased lexical bias using a sensorimotor adaptation task designed to evaluate whether automatic processes drive this bias in older adults. Method A group of older adults (n = 27) and younger adults (n = 35) participated in an altered auditory feedback production task. Participants produced target words and nonwords under altered feedback that affected the 1st formant of the vowel. There were 2 feedback conditions that affected the lexical status of the target, such that target words were shifted to sound more like nonwords (e.g., less-liss) and target nonwords to sound more like words (e.g., kess-kiss). Results A mixed-effects linear regression was used to investigate the magnitude of compensation to altered auditory feedback between age groups and lexical conditions. Over the course of the experiment, older adults compensated (by shifting their production of 1st formant) more to altered auditory feedback when producing words that were shifted toward nonwords (less-liss) than when producing nonwords that were shifted toward words (kess-kiss). This is in contrast to younger adults who compensated more to nonwords that were shifted toward words compared to words that were shifted toward nonwords. Conclusion We found no evidence that the increased lexical bias previously observed in older adults is driven by a greater sensitivity to top-down lexical influence on perceptual category boundaries. We suggest the increased lexical bias in older adults is driven by postperceptual processes that arise as a result of age-related cognitive and sensory changes.

RevDate: 2019-04-24

Schertz J, Carbonell K, AJ Lotto (2019)

Language Specificity in Phonetic Cue Weighting: Monolingual and Bilingual Perception of the Stop Voicing Contrast in English and Spanish.

Phonetica pii:000497278 [Epub ahead of print].

BACKGROUND/AIMS: This work examines the perception of the stop voicing contrast in Spanish and English along four acoustic dimensions, comparing monolingual and bilingual listeners. Our primary goals are to test the extent to which cue-weighting strategies are language-specific in monolinguals, and whether this language specificity extends to bilingual listeners.

METHODS: Participants categorized sounds varying in voice onset time (VOT, the primary cue to the contrast) and three secondary cues: fundamental frequency at vowel onset, first formant (F1) onset frequency, and stop closure duration. Listeners heard acoustically identical target stimuli, within language-specific carrier phrases, in English and Spanish modes.

RESULTS: While all listener groups used all cues, monolingual English listeners relied more on F1, and less on closure duration, than monolingual Spanish listeners, indicating language specificity in cue use. Early bilingual listeners used the three secondary cues similarly in English and Spanish, despite showing language-specific VOT boundaries.

CONCLUSION: While our findings reinforce previous work demonstrating language-specific phonetic representations in bilinguals in terms of VOT boundary, they suggest that this specificity may not extend straightforwardly to cue-weighting strategies.

RevDate: 2019-04-24

Kulikov V (2019)

Laryngeal Contrast in Qatari Arabic: Effect of Speaking Rate on Voice Onset Time.

Phonetica pii:000497277 [Epub ahead of print].

Beckman and colleagues claimed in 2011 that Swedish has an overspecified phonological contrast between prevoiced and voiceless aspirated stops. Yet, Swedish is the only language for which this pattern has been reported. The current study describes a similar phonological pattern in the vernacular Arabic dialect of Qatar. Acoustic measurements of main (voice onset time, VOT) and secondary (fundamental frequency, first formant) cues to voicing are based on production data of 8 native speakers of Qatari Arabic, who pronounced 1,380 voiced and voiceless word-initial stops in the slow and fast rate conditions. The results suggest that the VOT pattern found in voiced Qatari Arabic stops b, d, g is consistent with prevoicing in voice languages like Dutch, Russian, or Swedish. The pattern found in voiceless stops t, k is consistent with aspiration in aspirating languages like English, German, or Swedish. Similar to Swedish, both prevoicing and aspiration in Qatari Arabic stops change in response to speaking rate. VOT significantly increased by 19 ms in prevoiced stops and by 12 ms in voiceless stops in the slow speaking rate condition. The findings suggest that phonological overspecification in laryngeal contrasts may not be an uncommon pattern among languages.

RevDate: 2019-04-18

Hedrick M, Thornton KET, Yeager K, et al (2019)

The Use of Static and Dynamic Cues for Vowel Identification by Children Wearing Hearing Aids or Cochlear Implants.

Ear and hearing [Epub ahead of print].

OBJECTIVE: To examine vowel perception based on dynamic formant transition and/or static formant pattern cues in children with hearing loss while using their hearing aids or cochlear implants. We predicted that the sensorineural hearing loss would degrade formant transitions more than static formant patterns, and that shortening the duration of cues would cause more difficulty for vowel identification for these children than for their normal-hearing peers.

DESIGN: A repeated-measures, between-group design was used. Children 4 to 9 years of age from a university hearing services clinic who were fit for hearing aids (13 children) or who wore cochlear implants (10 children) participated. Chronologically age-matched children with normal hearing served as controls (23 children). Stimuli included three naturally produced syllables (/bα/, /bi/, and /bu/), which were presented either in their entirety or segmented to isolate the formant transition or the vowel static formant center. The stimuli were presented to listeners via loudspeaker in the sound field. Aided participants wore their own devices and listened with their everyday settings. Participants chose the vowel presented by selecting from corresponding pictures on a computer screen.

RESULTS: Children with hearing loss were less able to use shortened transition or shortened vowel centers to identify vowels as compared to their normal-hearing peers. Whole syllable and initial transition yielded better identification performance than the vowel center for /α/, but not for /i/ or /u/.

CONCLUSIONS: The children with hearing loss may require a longer time window than children with normal hearing to integrate vowel cues over time because of altered peripheral encoding in spectrotemporal domains. Clinical implications include cognizance of the importance of vowel perception when developing habilitative programs for children with hearing loss.

RevDate: 2019-04-15

Lowenstein JH, S Nittrouer (2019)

Perception-Production Links in Children's Speech.

Journal of speech, language, and hearing research : JSLHR, 62(4):853-867.

Purpose Child phonologists have long been interested in how tightly speech input constrains the speech production capacities of young children, and the question acquires clinical significance when children with hearing loss are considered. Children with sensorineural hearing loss often show differences in the spectral and temporal structures of their speech production, compared to children with normal hearing. The current study was designed to investigate the extent to which this problem can be explained by signal degradation. Method Ten 5-year-olds with normal hearing were recorded imitating 120 three-syllable nonwords presented in unprocessed form and as noise-vocoded signals. Target segments consisted of fricatives, stops, and vowels. Several measures were made: 2 duration measures (voice onset time and fricative length) and 4 spectral measures involving 2 segments (1st and 3rd moments of fricatives and 1st and 2nd formant frequencies for the point vowels). Results All spectral measures were affected by signal degradation, with vowel production showing the largest effects. Although a change in voice onset time was observed with vocoded signals for /d/, voicing category was not affected. Fricative duration remained constant. Conclusions Results support the hypothesis that quality of the input signal constrains the speech production capacities of young children. Consequently, it can be concluded that the production problems of children with hearing loss-including those with cochlear implants-can be explained to some extent by the degradation in the signal they hear. However, experience with both speech perception and production likely plays a role as well.

RevDate: 2019-04-05

Croake DJ, Andreatta RD, JC Stemple (2019)

Descriptive Analysis of the Interactive Patterning of the Vocalization Subsystems in Healthy Participants: A Dynamic Systems Perspective.

Journal of speech, language, and hearing research : JSLHR, 62(2):215-228.

Purpose Normative data for many objective voice measures are routinely used in clinical voice assessment; however, normative data reflect vocal output, but not vocalization process. The underlying physiologic processes of healthy phonation have been shown to be nonlinear and thus are likely different across individuals. Dynamic systems theory postulates that performance behaviors emerge from the nonlinear interplay of multiple physiologic components and that certain patterns are preferred and loosely governed by the interactions of physiology, task, and environment. The purpose of this study was to descriptively characterize the interactive nature of the vocalization subsystem triad in subjects with healthy voices and to determine if differing subgroups could be delineated to better understand how healthy voicing is physiologically generated. Method Respiratory kinematic, aerodynamic, and acoustic formant data were obtained from 29 individuals with healthy voices (21 female and eight male). Multivariate analyses were used to descriptively characterize the interactions among the subsystems that contributed to healthy voicing. Results Group data revealed representative measures of the 3 subsystems to be generally within the boundaries of established normative data. Despite this, 3 distinct clusters were delineated that represented 3 subgroups of individuals with differing subsystem patterning. Seven of the 9 measured variables in this study were found to be significantly different across at least 1 of the 3 subgroups indicating differing physiologic processes across individuals. Conclusion Vocal output in healthy individuals appears to be generated by distinct and preferred physiologic processes that were represented by 3 subgroups indicating that the process of vocalization is different among individuals, but not entirely idiosyncratic. Possibilities for these differences are explored using the framework of dynamic systems theory and the dynamics of emergent behaviors. A revised physiologic model of phonation that accounts for differences within and among the vocalization subsystems is described. Supplemental Material https://doi.org/10.23641/asha.7616462.

RevDate: 2019-03-28

Franken MK, Acheson DJ, McQueen JM, et al (2019)

Consistency influences altered auditory feedback processing.

Quarterly journal of experimental psychology (2006) [Epub ahead of print].

Previous research on the effect of perturbed auditory feedback in speech production has focused on two types of responses. In the short term, speakers generate compensatory motor commands in response to unexpected perturbations. In the longer term, speakers adapt feedforward motor programmes in response to feedback perturbations, to avoid future errors. The current study investigated the relation between these two types of responses to altered auditory feedback. Specifically, it was hypothesised that consistency in previous feedback perturbations would influence whether speakers adapt their feedforward motor programmes. In an altered auditory feedback paradigm, formant perturbations were applied either across all trials (the consistent condition) or only to some trials, whereas the others remained unperturbed (the inconsistent condition). The results showed that speakers' responses were affected by feedback consistency, with stronger speech changes in the consistent condition compared with the inconsistent condition. Current models of speech-motor control can explain this consistency effect. However, the data also suggest that compensation and adaptation are distinct processes, which are not in line with all current models.

RevDate: 2019-03-19

Stilp CE, AA Assgari (2019)

Natural speech statistics shift phoneme categorization.

Attention, perception & psychophysics pii:10.3758/s13414-018-01659-3 [Epub ahead of print].

All perception takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., more energy at lower first formant [F1] frequencies) differs from that of a later (target) sound (e.g., vowel with intermediate F1), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F1 /ɛ/). Historically, these studies used filters to force context sounds to possess desired spectral compositions. This approach is agnostic to the natural signal statistics of speech (inherent spectral compositions without any additional manipulations). The auditory system is thought to be attuned to such stimulus statistics, but this has gone untested. Here, vowel categorization was measured following unfiltered (already possessing the desired spectral composition) or filtered sentences (to match spectral characteristics of unfiltered sentences). Vowel categorization was biased in both cases, with larger biases as the spectral prominences in context sentences increased. This confirms sensitivity to natural signal statistics, extending spectral context effects in speech perception to more naturalistic listening conditions. Importantly, categorization biases were smaller and more variable following unfiltered sentences, raising important questions about how faithfully experiments using filtered contexts model everyday speech perception.

RevDate: 2019-03-11

Rodrigues S, Martins F, Silva S, et al (2019)

/l/ velarisation as a continuum.

PloS one, 14(3):e0213392 pii:PONE-D-18-30510.

In this paper, we present a production study to explore the controversial question about /l/ velarisation. Measurements of first (F1), second (F2) and third (F3) formant frequencies and the slope of F2 were analysed to clarify the /l/ velarisation behaviour in European Portuguese (EP). The acoustic data were collected from ten EP speakers, producing trisyllabic words with paroxytone stress pattern, with the liquid consonant at the middle of the word in onset, complex onset and coda positions. Results suggested that /l/ is produced on a continuum in EP. The consistently low F2 indicates that /l/ is velarised in all syllable positions, but variation especially in F1 and F3 revealed that /l/ could be "more velarised" or "less velarised" dependent on syllable positions and vowel contexts. These findings suggest that it is important to consider different acoustic measures to better understand /l/ velarisation in EP.

RevDate: 2019-03-06

Rampinini AC, Handjaras G, Leo A, et al (2019)

Formant Space Reconstruction From Brain Activity in Frontal and Temporal Regions Coding for Heard Vowels.

Frontiers in human neuroscience, 13:32.

Classical studies have isolated a distributed network of temporal and frontal areas engaged in the neural representation of speech perception and production. With modern literature arguing against unique roles for these cortical regions, different theories have favored either neural code-sharing or cortical space-sharing, thus trying to explain the intertwined spatial and functional organization of motor and acoustic components across the fronto-temporal cortical network. In this context, the focus of attention has recently shifted toward specific model fitting, aimed at motor and/or acoustic space reconstruction in brain activity within the language network. Here, we tested a model based on acoustic properties (formants), and one based on motor properties (articulation parameters), where model-free decoding of evoked fMRI activity during perception, imagery, and production of vowels had been successful. Results revealed that phonological information organizes around formant structure during the perception of vowels; interestingly, such a model was reconstructed in a broad temporal region, outside of the primary auditory cortex, but also in the pars triangularis of the left inferior frontal gyrus. Conversely, articulatory features were not associated with brain activity in these regions. Overall, our results call for a degree of interdependence based on acoustic information, between the frontal and temporal ends of the language network.

RevDate: 2019-03-02

Klaus A, Lametti DR, Shiller DM, et al (2019)

Can perceptual training alter the effect of visual biofeedback in speech-motor learning?.

The Journal of the Acoustical Society of America, 145(2):805.

Recent work showing that a period of perceptual training can modulate the magnitude of speech-motor learning in a perturbed auditory feedback task could inform clinical interventions or second-language training strategies. The present study investigated the influence of perceptual training on a clinically and pedagogically relevant task of vocally matching a visually presented speech target using visual-acoustic biofeedback. Forty female adults aged 18-35 yr received perceptual training targeting the English /æ-ɛ/ contrast, randomly assigned to a condition that shifted the perceptual boundary toward either /æ/ or /ɛ/. Participants were then asked to produce the word head while modifying their output to match a visually presented acoustic target corresponding with a slightly higher first formant (F1, closer to /æ/). By analogy to findings from previous research, it was predicted that individuals whose boundary was shifted toward /æ/ would also show a greater magnitude of change in the visual biofeedback task. After perceptual training, the groups showed the predicted difference in perceptual boundary location, but they did not differ in their performance on the biofeedback matching task. It is proposed that the explicit versus implicit nature of the tasks used might account for the difference between this study and previous findings.

RevDate: 2019-03-02

Dissen Y, Goldberger J, J Keshet (2019)

Formant estimation and tracking: A deep learning approach.

The Journal of the Acoustical Society of America, 145(2):642.

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

RevDate: 2019-03-02

Kirkham S, Nance C, Littlewood B, et al (2019)

Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English.

The Journal of the Acoustical Society of America, 145(2):784.

This study analyses the time-varying acoustics of laterals and their adjacent vowels in Manchester and Liverpool English. Generalized additive mixed-models (GAMMs) are used for quantifying time-varying formant data, which allows the modelling of non-linearities in acoustic time series while simultaneously modelling speaker and word level variability in the data. These models are compared to single time-point analyses of lateral and vowel targets in order to determine what analysing formant dynamics can tell about dialect variation in speech acoustics. The results show that lateral targets exhibit robust differences between some positional contexts and also between dialects, with smaller differences present in vowel targets. The time-varying analysis shows that dialect differences frequently occur globally across the lateral and adjacent vowels. These results suggest a complex relationship between lateral and vowel targets and their coarticulatory dynamics, which problematizes straightforward claims about the realization of laterals and their adjacent vowels. These findings are further discussed in terms of hypotheses about positional and sociophonetic variation. In doing so, the utility of GAMMs for analysing time-varying multi-segmental acoustic signals is demonstrated, and the significance of the results for accounts of English lateral typology is highlighted.

RevDate: 2019-02-12

Menda G, Nitzany EI, Shamble PS, et al (2019)

The Long and Short of Hearing in the Mosquito Aedes aegypti.

Current biology : CB pii:S0960-9822(19)30028-4 [Epub ahead of print].

Mating behavior in Aedes aegypti mosquitoes occurs mid-air and involves the exchange of auditory signals at close range (millimeters to centimeters) [1-6]. It is widely assumed that this intimate signaling distance reflects short-range auditory sensitivity of their antennal hearing organs to faint flight tones [7, 8]. To the contrary, we show here that male mosquitoes can hear the female's flight tone at surprisingly long distances-from several meters to up to 10 m-and that unrestrained, resting Ae. aegypti males leap off their perches and take flight when they hear female flight tones. Moreover, auditory sensitivity tests of Ae. aegypti's hearing organ, made from neurophysiological recordings of the auditory nerve in response to pure-tone stimuli played from a loudspeaker, support the behavioral experiments. This demonstration of long-range hearing in mosquitoes overturns the common assumption that the thread-like antennal hearing organs of tiny insects are strictly close-range ears. The effective range of a hearing organ depends ultimately on its sensitivity [9-13]. Here, a mosquito's antennal ear is shown to be sensitive to sound levels down to 31 dB sound pressure level (SPL), translating to air particle velocity at nanometer dimensions. We note that the peak of energy of the first formant of the vowels of the human speech spectrum range from about 200-1,000 Hz and is typically spoken at 45-70 dB SPL; together, they lie in the sweet spot of mosquito hearing. VIDEO ABSTRACT.

RevDate: 2019-02-10

Garellek M (2019)

Acoustic Discriminability of the Complex Phonation System in !Xóõ.

Phonetica pii:000494301 [Epub ahead of print].

Phonation types, or contrastive voice qualities, are minimally produced using complex movements of the vocal folds, but may additionally involve constriction in the supraglottal and pharyngeal cavities. These complex articulations in turn produce a multidimensional acoustic output that can be modeled in various ways. In this study, I investigate whether the psychoacoustic model of voice by Kreiman et al. (2014) succeeds at distinguishing six phonation types of !Xóõ. Linear discriminant analysis is performed using parameters from the model averaged over the entire vowel as well as for the first and final halves of the vowel. The results indicate very high classification accuracy for all phonation types. Measures averaged over the vowel's entire duration are closely correlated with the discriminant functions, suggesting that they are sufficient for distinguishing even dynamic phonation types. Measures from all classes of parameters are correlated with the linear discriminant functions; in particular, the "strident" vowels, which are harsh in quality, are characterized by their noise, changes in spectral tilt, decrease in voicing amplitude and frequency, and raising of the first formant. Despite the large number of contrasts and the time-varying characteristics of many of the phonation types, the phonation contrasts in !Xóõ remain well differentiated acoustically.

RevDate: 2019-02-10

Apaydın E, İkincioğulları A, Çolak M, et al (2019)

The Voice Performance After Septoplasty With Surgical Efficacy Demonstrated Through Acoustic Rhinometry and Rhinomanometry.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30531-9 [Epub ahead of print].

OBJECTIVE: To demonstrate the surgical efficacy of septoplasty using acoustic rhinometry (AR) and anterior rhinomanometry (ARM) and to evaluate the effect of septoplasty on voice performance through subjective voice analysis methods.

MATERIALS AND METHODS: This prospective study enrolled a total of 62 patients who underwent septoplasty with the diagnosis of deviated nasal septum. Thirteen patients with no postoperative improvement versus preoperative period as shown by AR and/or ARM tests and three patients with postoperative complications and four patients who were lost to follow-up were excluded. As a result, a total of 42 patients were included in the study. Objective tests including AR, ARM, acoustic voice analysis and spectrographic analysis were performed before the surgery and at 1 month and 3 months after the surgery. Subjective measures included the Nasal Obstruction Symptom Evaluation questionnaire to evaluate surgical success and Voice Handicap Index-30 tool for assessment of voice performance postoperatively, both completed by all study patients.

RESULTS: Among acoustic voice analysis parameters, F0, jitter, Harmonics-to-Noise Ratio values as well as formant frequency (F1-F2-F3-F4) values did not show significant differences postoperatively in comparison to the preoperative period (P > 0.05). Only the shimmer value was statistically significantly reduced at 1 month (P < 0.05) and 3 months postoperatively (P < 0.05) versus baseline. Statistically significant reductions in Voice Handicap Index-30 scores were observed at postoperative 1 month (P < 0.001) and 3 months (P < 0.001) compared to the preoperative period and between postoperative 1 month and 3 months (P < 0.05).

CONCLUSION: In this study, first operative success of septoplasty was demonstrated through objective tests and then objective voice analyses were performed to better evaluate the overall effect of septoplasty on voice performance. Shimmer value was found to be improved in the early and late postoperative periods.

RevDate: 2019-02-05

de Souza GVS, Duarte JMT, de Andrade Trinas FV, et al (2019)

An Acoustic Examination of Pitch Variation in Soprano Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30416-8 [Epub ahead of print].

INTRODUCTION: The ability to perform acoustic inspection of data and to correlate the results with perceptual and physiological aspects facilitates vocal behavior analysis. The singing voice has specific characteristics and parameters that are involved during the phonation mechanism, which may be analyzed acoustically.

OBJECTIVE: To describe and analyze the fundamental frequency and formants in pitch variation in the /a/ vowel in sopranos.

METHODS: The sample consisted of 30 female participants between the ages of 20 to 45 years without vocal complaints. All sustained vowel sounds were recorded with the /a/ vowel sustained for 5 seconds, with three replications at low (C4-261 Hz), medium (Eb4-622 Hz), and high (Bb4-932 Hz) frequencies that were comfortable for the voice classification. In total, 90 samples were analyzed with digital extraction of the fundamental frequency (f0) and the first five formants (F1, F2, F3, F4, and F5) and manual confirmation. The middle segment was considered for analysis, whereas the onset and offset segments were not considered. Subsequently, FFT (fast Fourier transform) plots, LPC (linear predictive coding) graphs, and tube diagrams were created. The Shapiro-Wilks test was applied for adherence and the Friedman test was applied for comparison of paired samples.

RESULTS: For vocalizations at low and medium pitches, higher values were observed for the first five formant frequencies than for the f0 value. Overlaying the LPC and FFT graphs revealed a similarity between F1 and F2 at the two pitches, with clustered harmonics in the F3, F4, and F5 region in the low pitch. At the medium pitch, there was similarity between F3 and F4, an F5 peak, and tuned harmonics. However, in the high-pitch vocalizations, there was an increase in the F2, F3, F4, and F5 values in relation to f0, and there was similarity between them along with synchrony between f0 and F1, H2 and F2, H3 and F3, H4 and F4, and H5 and F5.

CONCLUSIONS: Pitch changes indicate differences in the behavior of the fundamental frequency and sound formants in sopranos. The comparison of the sustained vowels sounds in f0 at the three pitches revealed specific vocal tract changes on the LPC curve and FFT harmonics, with an extra gain range at 261 Hz, synchrony between peaks of formants and harmonics at 622 Hz, and equivalence of f0 and F1 at 932 Hz.

RevDate: 2019-01-16

Galle ME, Klein-Packard J, Schreiber K, et al (2019)

What Are You Waiting For? Real-Time Integration of Cues for Fricatives Suggests Encapsulated Auditory Memory.

Cognitive science, 43(1):.

Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150-350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.

RevDate: 2019-01-15

Naderifar E, Ghorbani A, Moradi N, et al (2019)

Use of formant centralization ratio for vowel impairment detection in normal hearing and different degrees of hearing impairment.

Logopedics, phoniatrics, vocology [Epub ahead of print].

PURPOSE: Hearing-impaired (HI) speakers show changes in vowel production and formant frequencies, as well as more cases of overlapping between vowels and more restricted formant space, than hearing speakers. This study was intended to explore whether the use of different acoustic parameters (Formant Centralization Ratio (FCR), Vowel Space Area (VSA), F2i/F2u ratio (second formant of/i,u/)) was suitable or not for characterizing impairments in the articulation of vowels in the speech of HL speakers. In fact, correlated acoustic parameters are used to determine the limits of tongue movements in vowel production in different severity degrees of hearing impairment.

METHODS: Speech recordings of 40 speakers with HL and 40 healthy controls were acoustically analyzed. The vowels (/a/,/i/,/u/) were extracted from the word context and, then, the first and second formants were calculated. The same vowel-formant elements were used to construct the FCR, expressed as (F2u + F2a + F1i + F1u)/(F2i + F1a), the F2i/F2u ratio, and the vowel space area (VSA), expressed as ABS((F1i*(F2a-F2u)+F1a*(F2u-F2i)+F1u*(F2i-F2a))/2).

RESULTS: The FCR differentiated HL groups from the control group and the discrimination was not gender-sensitive. All parameters were found to be strongly correlated with each other.

CONCLUSIONS: The findings of this study showed that FCR was a more sensitive acoustic parameter than F2i/F2u ratio and VSA to distinguish speech of the HL groups from that of the normal group. Thus, FCR is considered to be applicable as an early objective measure of impaired vowel articulation in HL speakers.

RevDate: 2019-01-09

Boelen C, Chabot JM, Diot P, et al (2017)

Challenges to health system, medical profession and accreditation of medical schools.

La Revue du praticien, 67(3):250-254.

RevDate: 2019-01-08

Ballard KJ, Halaki M, Sowman P, et al (2018)

An Investigation of Compensation and Adaptation to Auditory Perturbations in Individuals With Acquired Apraxia of Speech.

Frontiers in human neuroscience, 12:510.

Two auditory perturbation experiments were used to investigate the integrity of neural circuits responsible for speech sensorimotor adaptation in acquired apraxia of speech (AOS). This has implications for understanding the nature of AOS as well as normal speech motor control. Two experiments were conducted. In Experiment 1, compensatory responses to unpredictable fundamental frequency (F0) perturbations during vocalization were investigated in healthy older adults and adults with acquired AOS plus aphasia. F0 perturbation involved upward and downward 100-cent shifts versus no shift, in equal proportion, during 2 s vocalizations of the vowel /a/. In Experiment 2, adaptive responses to sustained first formant (F1) perturbations during speech were investigated in healthy older adults, adults with AOS and adults with aphasia only (APH). The F1 protocol involved production of the vowel /ε/ in four consonant-vowel words of Australian English (pear, bear, care, dare), and one control word with a different vowel (paw). An unperturbed Baseline phase was followed by a gradual Ramp to a 30% upward F1 shift stimulating a compensatory response, a Hold phase where the perturbation was repeatedly presented with alternating blocks of masking trials to probe adaptation, and an End phase with masking trials only to measure persistence of any adaptation. AOS participants showed normal compensation to unexpected F0 perturbations, indicating that auditory feedback control of low-level, non-segmental parameters is intact. Furthermore, individuals with AOS displayed an adaptive response to sustained F1 perturbations, but age-matched controls and APH participants did not. These findings suggest that older healthy adults may have less plastic motor programs that resist modification based on sensory feedback, whereas individuals with AOS have less well-established and more malleable motor programs due to damage from stroke.

RevDate: 2019-01-02

Caldwell MT, Jiradejvong P, CJ Limb (2018)

Effects of Phantom Electrode Stimulation on Vocal Production in Cochlear Implant Users.

Ear and hearing [Epub ahead of print].

OBJECTIVES: Cochlear implant (CI) users suffer from a range of speech impairments, such as stuttering and vocal control of pitch and intensity. Though little research has focused on the role of auditory feedback in the speech of CI users, these speech impairments could be due in part to limited access to low-frequency cues inherent in CI-mediated listening. Phantom electrode stimulation (PES) represents a novel application of current steering that extends access to low frequencies for CI recipients. It is important to note that PES transmits frequencies below 300 Hz, whereas Baseline does not. The objective of this study was to explore the effects of PES on multiple frequency-related characteristics of voice production.

DESIGN: Eight postlingually deafened, adult Advanced Bionics CI users underwent a series of vocal production tests including Tone Repetition, Vowel Sound Production, Passage Reading, and Picture Description. Participants completed all of these tests twice: once with PES and once using their program used for everyday listening (Baseline). An additional test, Automatic Modulation, was included to measure acute effects of PES and was completed only once. This test involved switching between PES and Baseline at specific time intervals in real time as participants read a series of short sentences. Finally, a subjective Vocal Effort measurement was also included.

RESULTS: In Tone Repetition, the fundamental frequencies (F0) of tones produced using PES and the size of musical intervals produced using PES were significantly more accurate (closer to the target) compared with Baseline in specific gender, target tone range, and target tone type testing conditions. In the Vowel Sound Production task, vowel formant profiles produced using PES were closer to that of the general population compared with those produced using Baseline. The Passage Reading and Picture Description task results suggest that PES reduces measures of pitch variability (F0 standard deviation and range) in natural speech production. No significant results were found in comparisons of PES and Baseline in the Automatic Modulation task nor in the Vocal Effort task.

CONCLUSIONS: The findings of this study suggest that usage of PES increases accuracy of pitch matching in repeated sung tones and frequency intervals, possibly due to more accurate F0 representation. The results also suggest that PES partially normalizes the vowel formant profiles of select vowel sounds. PES seems to decrease pitch variability of natural speech and appears to have limited acute effects on natural speech production, though this finding may be due in part to paradigm limitations. On average, subjective ratings of vocal effort were unaffected by the usage of PES versus Baseline.

RevDate: 2019-01-02

Saba JN, Ali H, JHL Hansen (2018)

Formant priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

The Journal of the Acoustical Society of America, 144(6):3371.

The Advanced Combination Encoder (ACE) signal processing strategy is used in the majority of cochlear implant (CI) sound processors manufactured by Cochlear Corporation. This "n-of-m" strategy selects "n" out of "m" available frequency channels with the highest spectral energy in each stimulation cycle. It is hypothesized that at low signal-to-noise ratio (SNR) conditions, noise-dominant frequency channels are susceptible for selection, neglecting channels containing target speech cues. In order to improve speech segregation in noise, explicit encoding of formant frequency locations within the standard channel selection framework of ACE is suggested. Two strategies using the direct formant estimation algorithms are developed within this study, FACE (formant-ACE) and VFACE (voiced-activated-formant-ACE). Speech intelligibility from eight CI users is compared across 11 acoustic conditions, including mixtures of noise and reverberation at multiple SNRs. Significant intelligibility gains were observed with VFACE over ACE in 5 dB babble noise; however, results with FACE/VFACE in all other conditions were comparable to standard ACE. An increased selection of channels associated with the second formant frequency is observed for FACE and VFACE. Both proposed methods may serve as potential supplementary channel selection techniques for the ACE sound processing strategy for cochlear implants.

RevDate: 2019-01-02

Kochetov A, Tabain M, Sreedevi N, et al (2018)

Manner and place differences in Kannada coronal consonants: Articulatory and acoustic results.

The Journal of the Acoustical Society of America, 144(6):3221.

This study investigated articulatory differences in the realization of Kannada coronal consonants of the same place but different manner of articulation. This was done by examining tongue positions and acoustic formant transitions for dentals and retroflexes of three manners of articulation: stops, nasals, and laterals. Ultrasound imaging data collected from ten speakers of the language revealed that the tongue body/root was more forward for the nasal manner of articulation compared to stop and lateral consonants of the same place of articulation. The dental nasal and lateral were also produced with a higher front part of the tongue compared to the dental stop. As a result, the place contrast was greater in magnitude for the stops (being the prototypical dental vs retroflex) than for the nasals and laterals (being apparently alveolar vs retroflex). Acoustic formant transition differences were found to reflect some of the articulatory differences, while also providing evidence for the more dynamic articulation of nasal and lateral retroflexes. Overall, the results of the study shed light on factors underlying manner requirements (aerodynamic or physiological) and how the factors interact with principles of gestural economy/symmetry, providing an empirical baseline for further cross-language investigations and articulation-to-acoustics modeling.

RevDate: 2018-12-31

Mekyska J, Galaz Z, Kiska T, et al (2018)

Quantitative Analysis of Relationship Between Hypokinetic Dysarthria and the Freezing of Gait in Parkinson's Disease.

Cognitive computation, 10(6):1006-1018.

Hypokinetic dysarthria (HD) and freezing of gait (FOG) are both axial symptoms that occur in patients with Parkinson's disease (PD). It is assumed they have some common pathophysiological mechanisms and therefore that speech disorders in PD can predict FOG deficits within the horizon of some years. The aim of this study is to employ a complex quantitative analysis of the phonation, articulation and prosody in PD patients in order to identify the relationship between HD and FOG, and establish a mathematical model that would predict FOG deficits using acoustic analysis at baseline. We enrolled 75 PD patients who were assessed by 6 clinical scales including the Freezing of Gait Questionnaire (FOG-Q). We subsequently extracted 19 acoustic measures quantifying speech disorders in the fields of phonation, articulation and prosody. To identify the relationship between HD and FOG, we performed a partial correlation analysis. Finally, based on the selected acoustic measures, we trained regression models to predict the change in FOG during a 2-year follow-up. We identified significant correlations between FOG-Q scores and the acoustic measures based on formant frequencies (quantifying the movement of the tongue and jaw) and speech rate. Using the regression models, we were able to predict a change in particular FOG-Q scores with an error of between 7.4 and 17.0 %. This study is suggesting that FOG in patients with PD is mainly linked to improper articulation, a disturbed speech rate and to intelligibility. We have also proved that the acoustic analysis of HD at the baseline can be used as a predictor of the FOG deficit during 2 years of follow-up. This knowledge enables researchers to introduce new cognitive systems that predict gait difficulties in PD patients.

RevDate: 2018-12-20

Masapollo M, Zhao TC, Franklin L, et al (2018)

Asymmetric discrimination of nonspeech tonal analogues of vowels.

Journal of experimental psychology. Human perception and performance pii:2018-64940-001 [Epub ahead of print].

Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences because of the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with nonspeech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with 2-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in 1 or both of these 2 acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in nonspeech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

RevDate: 2018-12-19

Carney LH, JM McDonough (2018)

Nonlinear auditory models yield new insights into representations of vowels.

Attention, perception & psychophysics pii:10.3758/s13414-018-01644-w [Epub ahead of print].

Studies of vowel systems regularly appeal to the need to understand how the auditory system encodes and processes the information in the acoustic signal. The goal of this study is to present computational models to address this need, and to use the models to illustrate responses to vowels at two levels of the auditory pathway. Many of the models previously used to study auditory representations of speech are based on linear filter banks simulating the tuning of the inner ear. These models do not incorporate key nonlinear response properties of the inner ear that influence responses at conversational-speech sound levels. These nonlinear properties shape neural representations in ways that are important for understanding responses in the central nervous system. The model for auditory-nerve (AN) fibers used here incorporates realistic nonlinear properties associated with the basilar membrane, inner hair cells (IHCs), and the IHC-AN synapse. These nonlinearities set up profiles of f0-related fluctuations that vary in amplitude across the population of frequency-tuned AN fibers. Amplitude fluctuations in AN responses are smallest near formant peaks and largest at frequencies between formants. These f0-related fluctuations strongly excite or suppress neurons in the auditory midbrain, the first level of the auditory pathway where tuning for low-frequency fluctuations in sounds occurs. Formant-related amplitude fluctuations provide representations of the vowel spectrum in discharge rates of midbrain neurons. These representations in the midbrain are robust across a wide range of sound levels, including the entire range of conversational-speech levels, and in the presence of realistic background noise levels.

RevDate: 2018-12-14

Anikin A, N Johansson (2018)

Implicit associations between individual properties of color and sound.

Attention, perception & psychophysics pii:10.3758/s13414-018-01639-7 [Epub ahead of print].

We report a series of 22 experiments in which the implicit associations test (IAT) was used to investigate cross-modal correspondences between visual (luminance, hue [R-G, B-Y], saturation) and acoustic (loudness, pitch, formants [F1, F2], spectral centroid, trill) dimensions. Colors were sampled from the perceptually accurate CIE-Lab space, and the complex, vowel-like sounds were created with a formant synthesizer capable of separately manipulating individual acoustic properties. In line with previous reports, the loudness and pitch of acoustic stimuli were associated with both luminance and saturation of the presented colors. However, pitch was associated specifically with color lightness, whereas loudness mapped onto greater visual saliency. Manipulating the spectrum of sounds without modifying their pitch showed that an upward shift of spectral energy was associated with the same visual features (higher luminance and saturation) as higher pitch. In contrast, changing formant frequencies of synthetic vowels while minimizing the accompanying shifts in spectral centroid failed to reveal cross-modal correspondences with color. This may indicate that the commonly reported associations between vowels and colors are mediated by differences in the overall balance of low- and high-frequency energy in the spectrum rather than by vowel identity as such. Surprisingly, the hue of colors with the same luminance and saturation was not associated with any of the tested acoustic features, except for a weak preference to match higher pitch with blue (vs. yellow). We discuss these findings in the context of previous research and consider their implications for sound symbolism in world languages.

RevDate: 2018-12-12

Paltura C, K Yelken (2018)

An Examination of Vocal Tract Acoustics following Wendler's Glottoplasty.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 71(1):24-28 pii:000494970 [Epub ahead of print].

PURPOSE: To investigate the formant frequency (FF) features of transgender females' (TFs) voice after Wendler's glottoplasty surgery and compare these levels with age-matched healthy males and females.

STUDY DESIGN: Controlled prospective.

METHODS: 20 TFs and 20 genetically male and female age-matched healthy controls were enrolled in the study. The fundamental frequency (F0) and FFs F1-F4 were obtained from TF speakers 6 months after surgery. These levels were compared with those of healthy controls.

RESULTS: Statistical analysis showed that the median F0 values were similar between TFs and females. The median F1 levels of TFs were different from females but similar to males. The F2 levels of TFs were similar to females but different from males. The F3 and F4 levels were significantly different from both male and female controls.

CONCLUSION: Wendler's glottoplasty technique is an effective method to increase F0 levels among TF patients; however, these individuals report their voice does not sufficiently project femininity. The results obtained with regard to FF levels may be the reason for this problem. Voice therapy is recommended as a possible approach to assist TF patients achieve a satisfactory feminine voice.

RevDate: 2018-12-03

Hardy TLD, Rieger JM, Wells K, et al (2018)

Acoustic Predictors of Gender Attribution, Masculinity-Femininity, and Vocal Naturalness Ratings Amongst Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30355-2 [Epub ahead of print].

PURPOSE: This study aimed to identify the most salient set of acoustic predictors of (1) gender attribution; (2) perceived masculinity-femininity; and (3) perceived vocal naturalness amongst a group of transgender and cisgender speakers to inform voice and communication feminization training programs. This study used a unique set of acoustic variables and included a third, androgynous, choice for gender attribution ratings.

METHOD: Data were collected across two phases and involved two separate groups of participants: communicators and raters. In the first phase, audio recordings were captured of communicators (n = 40) during cartoon retell, sustained vowel, and carrier phrase tasks. Acoustic measures were obtained from these recordings. In the second phase, raters (n = 20) provided ratings of gender attribution, perceived masculinity-femininity, and vocal naturalness based on a sample of the cartoon description recording.

RESULTS: Results of a multinomial logistic regression analysis identified mean fundamental frequency (fo) as the sole acoustic measure that changed the odds of being attributed as a woman or ambiguous in gender rather than as a man. Multiple linear regression analyses identified mean fo, average formant frequency of /i/, and mean sound pressure level as predictors of masculinity-femininity ratings and mean fo, average formant frequency, and rate of speech as predictors of vocal naturalness ratings.

CONCLUSION: The results of this study support the continued targeting of fo and vocal tract resonance in voice and communication feminization/masculinization training programs and provide preliminary evidence for more emphasis being placed on vocal intensity and rate of speech. Modification of these voice parameters may help clients to achieve a natural-sounding voice that satisfactorily represents their affirmed gender.

RevDate: 2018-11-28

Fujimura S, Kojima T, Okanoue Y, et al (2018)

Discrimination of "hot potato voice" caused by upper airway obstruction utilizing a support vector machine.

The Laryngoscope [Epub ahead of print].

OBJECTIVES/HYPOTHESIS: "Hot potato voice" (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM).

STUDY DESIGN: Quasi-experimental research design.

METHODS: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed.

RESULTS: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%.

CONCLUSIONS: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy.

LEVEL OF EVIDENCE: 2c Laryngoscope, 2018.

RevDate: 2018-11-26
CmpDate: 2018-11-26

Chen Q, Liu J, Yang HM, et al (2018)

Research on tunable distributed SPR sensor based on bimetal film.

Applied optics, 57(26):7591-7599.

In order to overcome the limitations in range of traditional prism structure surface plasmon resonance (SPR) single-point sensor measurement, a symmetric bimetallic film SPR multi-sensor structure is proposed. Based on this, the dual-channel sensing attenuation mechanism of SPR in gold and silver composite film and the improvement of sensing characteristics were studied. By optimizing the characteristics such as material and thickness, a wider range of dual-channel distributed sensing is realized. Using a He-Ne laser (632.8 nm) as the reference light source, prism-excited symmetric SPR sensing was studied theoretically for a symmetrical metal-clad dielectric waveguide using thin-film optics theory. The influence of the angle of incidence of the light source and the thickness of the dielectric layer on the performance of SPR dual formant sensing is explained. The finite-difference time-domain method was used for the simulation calculation for various thicknesses and compositions of the symmetric combined layer, resulting in the choice of silver (30 nm) and gold (10 nm). When the incident angle was 78 deg, the quality factor reached 5960, showing an excellent resonance sensing effect. The sensitivity reached a maximum of 5.25×10-5 RIU when testing the water content of an aqueous solution of honey, which proves the feasibility and practicality of the structure design. The structure improves the theoretical basis for designing an SPR multi-channel distributed sensing system, which can greatly reduce the cost of biochemical detection and significantly increase the detection efficiency.

RevDate: 2018-11-18

Graf S, Schwiebacher J, Richter L, et al (2018)

Adjustment of Vocal Tract Shape via Biofeedback: Influence on Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30326-6 [Epub ahead of print].

The study assessed 30 nonprofessional singers to evaluate the effects of vocal tract shape adjustment via increased resonance toward an externally applied sinusoidal frequency of 900 Hz without phonation. The amplification of the sound wave was used as biofeedback signal and the intensity and the formant position of the basic vowels /a/, /e/, /i/, /o/, and /u/ were compared before and after a vocal tract adjustment period. After the adjustment period, the intensities for all vowels increased and the measured changes correlated with the participants' self-perception.The diferences between the second formant position of the vowels and the applied frequency influences the changes in amplitude and in formant frequencies. The most significant changes in formant frequency occurred with vowels that did not include a formant frequency of 900 Hz, while the increase in amplitude was the strongest for vowels with a formant frequency of about 900 Hz.

RevDate: 2018-11-16

Bhat GS, Reddy CKA, Shankar N, et al (2018)

Smartphone based real-time super Gaussian single microphone Speech Enhancement to improve intelligibility for hearing aid users using formant information.

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2018:5503-5506.

In this paper, we present a Speech Enhancement (SE) technique to improve intelligibility of speech perceived by Hearing Aid users using smartphone as an assistive device. We use the formant frequency information to improve the overall quality and intelligibility of the speech. The proposed SE method is based on new super Gaussian joint maximum a Posteriori (SGJMAP) estimator. Using the priori information of formant frequency locations, the derived gain function has " tradeoff" factors that allows the smartphone user to customize perceptual preference, by controlling the amount of noise suppression and speech distortion in real-time. The formant frequency information helps the hearing aid user to control the gains over the non-formant frequency band, allowing the HA users to attain more noise suppression while maintaining the speech intelligibility using a smartphone application. Objective intelligibility measures and subjective results reflect the usability of the developed SE application in noisy real world acoustic environment.

RevDate: 2018-11-14

Williams D, Escudero P, A Gafos (2018)

Spectral change and duration as cues in Australian English listeners' front vowel categorization.

The Journal of the Acoustical Society of America, 144(3):EL215.

Australian English /iː/, /ɪ/, and /ɪə/ exhibit almost identical average first (F1) and second (F2) formant frequencies and differ in duration and vowel inherent spectral change (VISC). The cues of duration, F1 × F2 trajectory direction (TD) and trajectory length (TL) were assessed in listeners' categorization of /iː/ and /ɪə/ compared to /ɪ/. Duration was important for distinguishing both /iː/ and /ɪə/ from /ɪ/. TD and TL were important for categorizing /iː/ versus /ɪ/, whereas only TL was important for /ɪə/ versus /ɪ/. Finally, listeners' use of duration and VISC was not mutually affected for either vowel compared to /ɪ/.

RevDate: 2018-11-09

Gómez-Vilda P, Gómez-Rodellar A, Vicente JMF, et al (2018)

Neuromechanical Modelling of Articulatory Movements from Surface Electromyography and Speech Formants.

International journal of neural systems [Epub ahead of print].

Speech articulation is produced by the movements of muscles in the larynx, pharynx, mouth and face. Therefore speech shows acoustic features as formants which are directly related with neuromotor actions of these muscles. The first two formants are strongly related with jaw and tongue muscular activity. Speech can be used as a simple and ubiquitous signal, easy to record and process, either locally or on e-Health platforms. This fact may open a wide set of applications in the study of functional grading and monitoring neurodegenerative diseases. A relevant question, in this sense, is how far speech correlates and neuromotor actions are related. This preliminary study is intended to find answers to this question by using surface electromyographic recordings on the masseter and the acoustic kinematics related with the first formant. It is shown in the study that relevant correlations can be found among the surface electromyographic activity (dynamic muscle behavior) and the positions and first derivatives of the first formant (kinematic variables related to vertical velocity and acceleration of the joint jaw and tongue biomechanical system). As an application example, it is shown that the probability density function associated to these kinematic variables is more sensitive than classical features as Vowel Space Area (VSA) or Formant Centralization Ratio (FCR) in characterizing neuromotor degeneration in Parkinson's Disease.

RevDate: 2018-10-26

Lopes LW, Alves JDN, Evangelista DDS, et al (2018)

Accuracy of traditional and formant acoustic measurements in the evaluation of vocal quality.

CoDAS, 30(5):e20170282 pii:S2317-17822018000500310.

PURPOSE: Investigate the accuracy of isolated and combined acoustic measurements in the discrimination of voice deviation intensity (GD) and predominant voice quality (PVQ) in patients with dysphonia.

METHODS: A total of 302 female patients with voice complaints participated in the study. The sustained /ɛ/ vowel was used to extract the following acoustic measures: mean and standard deviation (SD) of fundamental frequency (F0), jitter, shimmer, glottal to noise excitation (GNE) ratio and the mean of the first three formants (F1, F2, and F3). Auditory-perceptual evaluation of GD and PVQ was conducted by three speech-language pathologists who were voice specialists.

RESULTS: In isolation, only GNE provided satisfactory performance when discriminating between GD and PVQ. Improvement in the classification of GD and PVQ was observed when the acoustic measures were combined. Mean F0, F2, and GNE (healthy × mild-to-moderate deviation), the SDs of F0, F1, and F3 (mild-to-moderate × moderate deviation), and mean jitter and GNE (moderate × intense deviation) were the best combinations for discriminating GD. The best combinations for discriminating PVQ were mean F0, shimmer, and GNE (healthy × rough), F3 and GNE (healthy × breathy), mean F 0, F3, and GNE (rough × tense), and mean F0 , F1, and GNE (breathy × tense).

CONCLUSION: In isolation, GNE proved to be the only acoustic parameter capable of discriminating between GG and PVQ. There was a gain in classification performance for discrimination of both GD and PVQ when traditional and formant acoustic measurements were combined.

RevDate: 2018-10-23

Grawunder S, Crockford C, Clay Z, et al (2018)

Higher fundamental frequency in bonobos is explained by larynx morphology.

Current biology : CB, 28(20):R1188-R1189.

Acoustic signals, shaped by natural and sexual selection, reveal ecological and social selection pressures [1]. Examining acoustic signals together with morphology can be particularly revealing. But this approach has rarely been applied to primates, where clues to the evolutionary trajectory of human communication may be found. Across vertebrate species, there is a close relationship between body size and acoustic parameters, such as formant dispersion and fundamental frequency (f0). Deviations from this acoustic allometry usually produce calls with a lower f0 than expected for a given body size, often due to morphological adaptations in the larynx or vocal tract [2]. An unusual example of an obvious mismatch between fundamental frequency and body size is found in the two closest living relatives of humans, bonobos (Pan paniscus) and chimpanzees (Pan troglodytes). Although these two ape species overlap in body size [3], bonobo calls have a strikingly higher f0 than corresponding calls from chimpanzees [4]. Here, we compare acoustic structures of calls from bonobos and chimpanzees in relation to their larynx morphology. We found that shorter vocal fold length in bonobos compared to chimpanzees accounted for species differences in f0, showing a rare case of positive selection for signal diminution in both bonobo sexes.

RevDate: 2018-10-27

Niziolek CA, S Kiran (2018)

Assessing speech correction abilities with acoustic analyses: Evidence of preserved online correction in persons with aphasia.

International journal of speech-language pathology [Epub ahead of print].

PURPOSE: Disorders of speech production may be accompanied by abnormal processing of speech sensory feedback. Here, we introduce a semi-automated analysis designed to assess the degree to which speakers use natural online feedback to decrease acoustic variability in spoken words. Because production deficits in aphasia have been hypothesised to stem from problems with sensorimotor integration, we investigated whether persons with aphasia (PWA) can correct their speech acoustics online.

METHOD: Eight PWA in the chronic stage produced 200 repetitions each of three monosyllabic words. Formant variability was measured for each vowel in multiple time windows within the syllable, and the reduction in formant variability from vowel onset to midpoint was quantified.

RESULT: PWA significantly decreased acoustic variability over the course of the syllable, providing evidence of online feedback correction mechanisms. The magnitude of this corrective formant movement exceeded past measurements in control participants.

CONCLUSION: Vowel centreing behaviour suggests that error correction abilities are at least partially spared in speakers with aphasia, and may be relied upon to compensate for feedforward deficits by bringing utterances back on track. These proof of concept data show the potential of this analysis technique to elucidate the mechanisms underlying disorders of speech production.

RevDate: 2018-10-21

Fazeli M, Moradi N, Soltani M, et al (2018)

Dysphonia Characteristics and Vowel Impairment in Relation to Neurological Status in Patients with Multiple Sclerosis.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30351-5 [Epub ahead of print].

PURPOSE: In this study, we attempted to assess the phonation and articulation subsystem changes in patients with multiple sclerosis compared to healthy individuals using Dysphonia Severity Index and Formant Centralization Ratio with the aim of evaluating the correlation between these two indexes with neurological status.

MATERIALS AND METHODS: A sample of 47 patients with multiple sclerosis and 20 healthy speakers were evaluated. Patients' disease duration and disability were monitored by a neurologist. Dysphonia Severity Index and Formant Centralization Ratio scores were computed for each individual. Acoustic analysis was performed by Praat software; the statistical analysis was run using SPSS 21. To compare multiple sclerosis patients with the control group, Mann-Whitney U test was used for non-normal data and independent-samples t test for normal data. Also a logistic regression was used to compare the data. Correlation between acoustic characteristics and neurological status was verified using Spearman correlation coefficient and linear regression was performed to evaluate the simultaneous effects of neurological data.

RESULTS: Statistical analysis revealed that a significant difference existed between multiple sclerosis and healthy participants. Formant Centralization Ratio had a significant correlation with disease severity.

CONCLUSION: Multiple sclerosis patients would be differentiated from healthy individuals by their phonation and articulatory features. Scores of these two indexes can be considered as appropriate criteria for onset of the speech problems in multiple sclerosis. Also, articulation subsystem changes might be useful signs for the progression of the disease.

RevDate: 2018-10-19

Brabenec L, Klobusiakova P, Barton M, et al (2018)

Non-invasive stimulation of the auditory feedback area for improved articulation in Parkinson's disease.

Parkinsonism & related disorders pii:S1353-8020(18)30439-5 [Epub ahead of print].

INTRODUCTION: Hypokinetic dysarthria (HD) is a common symptom of Parkinson's disease (PD) which does not respond well to PD treatments. We investigated acute effects of repetitive transcranial magnetic stimulation (rTMS) of the motor and auditory feedback area on HD in PD using acoustic analysis of speech.

METHODS: We used 10 Hz and 1 Hz stimulation protocols and applied rTMS over the left orofacial primary motor area, the right superior temporal gyrus (STG), and over the vertex (a control stimulation site) in 16 PD patients with HD. A cross-over design was used. Stimulation sites and protocols were randomised across subjects and sessions. Acoustic analysis of a sentence reading task performed inside the MR scanner was used to evaluate rTMS-induced effects on motor speech. Acute fMRI changes due to rTMS were also analysed.

RESULTS: The 1 Hz STG stimulation produced significant increases of the relative standard deviation of the 2nd formant (p = 0.019), i.e. an acoustic parameter describing the tongue and jaw movements. The effects were superior to the control site stimulation and were accompanied by increased resting state functional connectivity between the stimulated region and the right parahippocampal gyrus. The rTMS-induced acoustic changes were correlated with the reading task-related BOLD signal increases of the stimulated area (R = 0.654, p = 0.029).

CONCLUSION: Our results demonstrate for the first time that low-frequency stimulation of the temporal auditory feedback area may improve articulation in PD and enhance functional connectivity between the STG and the cortical region involved in an overt speech control.

RevDate: 2018-10-19

Gómez-Vilda P, Galaz Z, Mekyska J, et al (2018)

Vowel Articulation Dynamic Stability Related to Parkinson's Disease Rating Features: Male Dataset.

International journal of neural systems [Epub ahead of print].

Neurodegenerative pathologies as Parkinson's Disease (PD) show important distortions in speech, affecting fluency, prosody, articulation and phonation. Classically, measurements based on articulation gestures altering formant positions, as the Vocal Space Area (VSA) or the Formant Centralization Ratio (FCR) have been proposed to measure speech distortion, but these markers are based mainly on static positions of sustained vowels. The present study introduces a measurement based on the mutual information distance among probability density functions of kinematic correlates derived from formant dynamics. An absolute kinematic velocity associated to the position of the jaw and tongue articulation gestures is estimated and modeled statistically. The distribution of this feature may differentiate PD patients from normative speakers during sustained vowel emission. The study is based on a limited database of 53 male PD patients, contrasted to a very selected and stable set of eight normative speakers. In this sense, distances based on Kullback-Leibler divergence seem to be sensitive to PD articulation instability. Correlation studies show statistically relevant relationship between information contents based on articulation instability to certain motor and nonmotor clinical scores, such as freezing of gait, or sleep disorders. Remarkably, one of the statistically relevant correlations point out to the time interval passed since the first diagnostic. These results stress the need of defining scoring scales specifically designed for speech disability estimation and monitoring methodologies in degenerative diseases of neuromotor origin.

RevDate: 2018-11-14

den Ouden DB, Galkina E, Basilakos A, et al (2018)

Vowel Formant Dispersion Reflects Severity of Apraxia of Speech.

Aphasiology, 32(8):902-921.

Background: Apraxia of Speech (AOS) has been associated with deviations in consonantal voice-onset-time (VOT), but studies of vowel acoustics have yielded conflicting results. However, a speech motor planning disorder that is not bound by phonological categories is expected to affect vowel as well as consonant articulations.

Aims: We measured consonant VOTs and vowel formants produced by a large sample of stroke survivors, and assessed to what extent these variables and their dispersion are predictive of AOS presence and severity, based on a scale that uses clinical observations to rate gradient presence of AOS, aphasia, and dysarthria.

Methods & Procedures: Picture-description samples were collected from 53 stroke survivors, including unimpaired speakers (12) and speakers with primarily aphasia (19), aphasia with AOS (12), primarily AOS (2), aphasia with dysarthria (2), and aphasia with AOS and dysarthria (6). The first three formants were extracted from vowel tokens bearing main stress in open-class words, as well as VOTs for voiced and voiceless stops. Vowel space was estimated as reflected in the formant centralization ratio. Stepwise Linear Discriminant Analyses were used to predict group membership, and ordinal regression to predict AOS severity, based on the absolute values of these variables, as well as the standard deviations of formants and VOTs within speakers.

Outcomes and Results: Presence and severity of AOS were most consistently predicted by the dispersion of F1, F2, and voiced-stop VOT. These phonetic-acoustic measures do not correlate with aphasia severity.

Conclusions: These results confirm that the AOS affects articulation across-the-board and does not selectively spare vowel production.

RevDate: 2018-11-14

Baotic A, Garcia M, Boeckle M, et al (2018)

Field Propagation Experiments of Male African Savanna Elephant Rumbles: A Focus on the Transmission of Formant Frequencies.

Animals : an open access journal from MDPI, 8(10): pii:ani8100167.

African savanna elephants live in dynamic fission⁻fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the 'rumble', with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral⁻temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant's natural habitat.

RevDate: 2018-10-01

Pabon P, S Ternström (2018)

Feature Maps of the Acoustic Spectrum of the Voice.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30185-1 [Epub ahead of print].

The change in the spectrum of sustained /a/ vowels was mapped over the voice range from low to high fundamental frequency and low to high sound pressure level (SPL), in the form of the so-called voice range profile (VRP). In each interval of one semitone and one decibel, narrowband spectra were averaged both within and across subjects. The subjects were groups of 7 male and 12 female singing students, as well as a group of 16 untrained female voices. For each individual and also for each group, pairs of VRP recordings were made, with stringent separation of the modal/chest and falsetto/head registers. Maps are presented of eight scalar metrics, each of which was chosen to quantify a particular feature of the voice spectrum, over fundamental frequency and SPL. Metrics 1 and 2 chart the role of the fundamental in relation to the rest of the spectrum. Metrics 3 and 4 are used to explore the role of resonances in relation to SPL. Metrics 5 and 6 address the distribution of high frequency energy, while metrics 7 and 8 seek to describe the distribution of energy at the low end of the voice spectrum. Several examples are observed of phenomena that are difficult to predict from linear source-filter theory, and of the voice source being less uniform over the voice range than is conventionally assumed. These include a high-frequency band-limiting at high SPL and an unexpected persistence of the second harmonic at low SPL. The two voice registers give rise to clearly different maps. Only a few effects of training were observed, in the low frequency end below 2kHz. The results are of potential interest in voice analysis, voice synthesis and for new insights into the voice production mechanism.

RevDate: 2018-09-30

Kraus MS, Walker TM, Jarskog LF, et al (2018)

Basic auditory processing deficits and their association with auditory emotion recognition in schizophrenia.

Schizophrenia research pii:S0920-9964(18)30542-5 [Epub ahead of print].

BACKGROUND: Individuals with schizophrenia are impaired in their ability to recognize emotions based on vocal cues and these impairments are associated with poor global outcome. Basic perceptual processes, such as auditory pitch processing, are impaired in schizophrenia and contribute to difficulty identifying emotions. However, previous work has focused on a relatively narrow assessment of auditory deficits and their relation to emotion recognition impairment in schizophrenia.

METHODS: We have assessed 87 patients with schizophrenia and 73 healthy controls on a comprehensive battery of tasks spanning the five empirically derived domains of auditory function. We also explored the relationship between basic auditory processing and auditory emotion recognition within the patient group using correlational analysis.

RESULTS: Patients exhibited widespread auditory impairments across multiple domains of auditory function, with mostly medium effect sizes. Performance on all of the basic auditory tests correlated with auditory emotion recognition at the p < .01 level in the patient group, with 9 out of 13 tests correlating with emotion recognition at r = 0.40 or greater. After controlling for cognition, many of the largest correlations involved spectral processing within the phase-locking range and discrimination of vocally based stimuli.

CONCLUSIONS: While many auditory skills contribute to this impairment, deficient formant discrimination appears to be a key skill contributing to impaired emotion recognition as this was the only basic auditory skill to enter a step-wise multiple regression after first entering a measure of cognitive impairment, and formant discrimination accounted for significant unique variance in emotion recognition performance after accounting for deficits in pitch processing.

RevDate: 2018-09-25

Han C, Wang H, Fasolt V, et al (2018)

No clear evidence for correlations between handgrip strength and sexually dimorphic acoustic properties of voices.

American journal of human biology : the official journal of the Human Biology Council [Epub ahead of print].

OBJECTIVES: Recent research on the signal value of masculine physical characteristics in men has focused on the possibility that such characteristics are valid cues of physical strength. However, evidence that sexually dimorphic vocal characteristics are correlated with physical strength is equivocal. Consequently, we undertook a further test for possible relationships between physical strength and masculine vocal characteristics.

METHODS: We tested the putative relationships between White UK (N = 115) and Chinese (N = 106) participants' handgrip strength (a widely used proxy for general upper-body strength) and five sexually dimorphic acoustic properties of voices: fundamental frequency (F0), fundamental frequency's SD (F0-SD), formant dispersion (Df), formant position (Pf), and estimated vocal-tract length (VTL).

RESULTS: Analyses revealed no clear evidence that stronger individuals had more masculine voices.

CONCLUSIONS: Our results do not support the hypothesis that masculine vocal characteristics are a valid cue of physical strength.

RevDate: 2018-11-13

Easwar V, Banyard A, Aiken SJ, et al (2018)

Phase-locked responses to the vowel envelope vary in scalp-recorded amplitude due to across-frequency response interactions.

The European journal of neuroscience, 48(10):3126-3145.

Neural encoding of the envelope of sounds like vowels is essential to access temporal information useful for speech recognition. Subcortical responses to envelope periodicity of vowels can be assessed using scalp-recorded envelope following responses (EFRs); however, the amplitude of EFRs vary by vowel spectra and the causal relationship is not well understood. One cause for spectral dependency could be interactions between responses with different phases, initiated by multiple stimulus frequencies. Phase differences can arise from earlier initiation of processing high frequencies relative to low frequencies in the cochlea. This study investigated the presence of such phase interactions by measuring EFRs to two naturally spoken vowels (/ε/ and /u/), while delaying the envelope phase of the second formant band (F2+) relative to the first formant (F1) band in 45° increments. At 0° F2+ phase delay, EFRs elicited by the vowel /ε/ were lower in amplitude than the EFRs elicited by /u/. Using vector computations, we found that the lower amplitude of /ε/-EFRs was caused by linear superposition of F1- and F2+-contributions with larger F1-F2+ phase differences (166°) compared to /u/ (19°). While the variation in amplitude across F2+ phase delays could be modeled with two dominant EFR sources for both vowels, the degree of variation was dependent on F1 and F2+ EFR characteristics. Together, we demonstrate that (a) broadband sounds like vowels elicit independent responses from different stimulus frequencies that may be out-of-phase and affect scalp-based measurements, and (b) delaying higher frequency formants can maximize EFR amplitudes for some vowels.

RevDate: 2018-11-19

Omidvar S, Mahmoudian S, Khabazkhoob M, et al (2018)

Tinnitus Impacts on Speech and Non-speech Stimuli.

Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology, 39(10):e921-e928.

OBJECTIVE: To investigate how tinnitus affects the processing of speech and non-speech stimuli at the subcortical level.

STUDY DESIGN: Cross-sectional analytical study.

SETTING: Academic, tertiary referral center.

PATIENTS: Eighteen individuals with tinnitus and 20 controls without tinnitus matched based on their age and sex. All subjects had normal hearing sensitivity.


MAIN OUTCOME MEASURES: The effect of tinnitus on the parameters of auditory brainstem responses (ABR) to non-speech (click-ABR), and speech (sABR) stimuli was investigated.

RESULTS: Latencies of click ABR in waves III, V, and Vn, as well as inter-peak latency (IPL) of I to V were significantly longer in individuals with tinnitus compared with the controls. Individuals with tinnitus demonstrated significantly longer latencies of all sABR waves than the control group. The tinnitus patients also exhibited a significant decrease in the slope of the V-A complex and reduced encoding of the first and higher formants. A significant difference was observed between the two groups in the spectral magnitudes, the first formant frequency range (F1) and a higher frequency region (HF).

CONCLUSIONS: Our findings suggest that maladaptive neural plasticity resulting from tinnitus can be subcortically measured and affects timing processing of both speech and non-speech stimuli. The findings have been discussed based on models of maladaptive plasticity and the interference of tinnitus as an internal noise in synthesizing speech auditory stimuli.

RevDate: 2018-09-21

Charlton BD, Owen MA, Keating JL, et al (2018)

Sound transmission in a bamboo forest and its implications for information transfer in giant panda (Ailuropoda melanoleuca) bleats.

Scientific reports, 8(1):12754 pii:10.1038/s41598-018-31155-5.

Although mammal vocalisations signal attributes about the caller that are important in a range of contexts, relatively few studies have investigated the transmission of specific types of information encoded in mammal calls. In this study we broadcast and re-recorded giant panda bleats in a bamboo plantation, to assess the stability of individuality and sex differences in these calls over distance, and determine how the acoustic structure of giant panda bleats degrades in this species' typical environment. Our results indicate that vocal recognition of the caller's identity and sex is not likely to be possible when the distance between the vocaliser and receiver exceeds 20 m and 10 m, respectively. Further analysis revealed that the F0 contour of bleats was subject to high structural degradation as it propagated through the bamboo canopy, making the measurement of mean F0 and F0 modulation characteristics highly unreliable at distances exceeding 10 m. The most stable acoustic features of bleats in the bamboo forest environment (lowest % variation) were the upper formants and overall formant spacing. The analysis of amplitude attenuation revealed that the fifth and sixth formant are more prone to decay than the other frequency components of bleats, however, the fifth formant still remained the most prominent and persistent frequency component over distance. Paired with previous studies, these results show that giant panda bleats have the potential to signal the caller's identity at distances of up to 20 m and reliably transmit sex differences up to 10 m from the caller, and suggest that information encoded by F0 modulation in bleats could only be functionally relevant during close-range interactions in this species' natural environment.

RevDate: 2018-11-14

Ward RM, DG Kelty-Stephen (2018)

Bringing the Nonlinearity of the Movement System to Gestural Theories of Language Use: Multifractal Structure of Spoken English Supports the Compensation for Coarticulation in Human Speech Perception.

Frontiers in physiology, 9:1152.

Coarticulation is the tendency for speech vocalization and articulation even at the phonemic level to change with context, and compensation for coarticulation (CfC) reflects the striking human ability to perceive phonemic stability despite this variability. A current controversy centers on whether CfC depends on contrast between formants of a speech-signal spectrogram-specifically, contrast between offset formants concluding context stimuli and onset formants opening the target sound-or on speech-sound variability specific to the coordinative movement of speech articulators (e.g., vocal folds, postural muscles, lips, tongues). This manuscript aims to encode that coordinative-movement context in terms of speech-signal multifractal structure and to determine whether speech's multifractal structure might explain the crucial gestural support for any proposed spectral contrast. We asked human participants to categorize individual target stimuli drawn from an 11-step [ga]-to-[da] continuum as either phonemes "GA" or "DA." Three groups each heard a specific-type context stimulus preceding target stimuli: either real-speech [al] or [a], sine-wave tones at the third-formant offset frequency of either [al] or [aɹ], and either simulated-speech contexts [al] or [aɹ]. Here, simulating speech contexts involved randomizing the sequence of relatively homogeneous pitch periods within vowel-sound [a] of each [al] and [aɹ]. Crucially, simulated-speech contexts had the same offset and extremely similar vowel formants as and, to additional naïve participants, sounded identical to real-speech contexts. However, randomization distorted original speech-context multifractality, and effects of spectral contrast following speech only appeared after regression modeling of trial-by-trial "GA" judgments controlled for context-stimulus multifractality. Furthermore, simulated-speech contexts elicited faster responses (like tone contexts do) and weakened known biases in CfC, suggesting that spectral contrast depends on the nonlinear interactions across multiple scales that articulatory gestures express through the speech signal. Traditional mouse-tracking behaviors measured as participants moved their computer-mouse cursor to register their "GA"-or-"DA" decisions with mouse-clicks suggest that listening to speech leads the movement system to resonate with the multifractality of context stimuli. We interpret these results as shedding light on a new multifractal terrain upon which to found a better understanding in which movement systems play an important role in shaping how speech perception makes use of acoustic information.

RevDate: 2018-09-19

Hu XJ, Li FF, CC Lau (2018)

Development of the Mandarin speech banana.

International journal of speech-language pathology [Epub ahead of print].

PURPOSE: For Indo-European languages, "speech banana" is widely used to verify the benefits of hearing aids and cochlear implants. As a standardised "Mandarin speech banana" is not available, clinicians in China typically use a non-Mandarin speech banana. However, as Chinese is logographic and tonal, using a non-Mandarin speech banana is inappropriate. This paper was designed to develop the Mandarin speech banana according to the Mandarin phonetic properties.

METHOD: In the first experiment, 14 participants read aloud the standard Mandarin initials and finals. For each pronounced sound, its formants were measured. The boundary of all formants formed the formant graph (intensity versus frequency). In the second experiment, 20 participants listened to a list of pre-recorded initials and finals that had been filtered with different bandwidths. The minimum bandwidth to recognise a target sound defined its location on the formant graph.

RESULT: The Mandarin speech banana was generated with recognisable initials and finals on the formant graph. Tone affected the shape of the formant graph, especially at low frequencies.

CONCLUSION: Clinicians can use the new M andarin speech banana to counsel patients about what sounds are inaudible to them. Speech training can be implemented based on the unheard sounds in the speech banana.

RevDate: 2018-11-01

Sfakianaki A, Nicolaidis K, Okalidou A, et al (2018)

Coarticulatory dynamics in Greek disyllables produced by young adults with and without hearing loss.

Clinical linguistics & phonetics, 32(12):1162-1184.

Hearing loss affects both speech perception and production with detrimental effects on various speech characteristics including coarticulatory dynamics. The aim of the present study is to explore consonant-to-vowel (C-to-V) and vowel-to-vowel (V-to-V) coarticulation in magnitude, direction and temporal extent in the speech of young adult male and female speakers of Greek with normal hearing (NH) and hearing impairment (HI). Nine intelligible speakers with profound HI, using conventional hearing aids, and five speakers with NH produced /pV1CV2/ disyllables, with the point vowels /i, a, u/ and the consonants /p, t, s/, stressed either on the first or the second syllable. Formant frequencies F1 and F2 were measured in order to examine C-to-V effects at vowel midpoint and V-to-V effects at vowel onset, midpoint and offset. The acoustic and statistical analyses revealed similarities but also significant differences regarding coarticulatory patterns of the two groups. Interestingly, prevalence of anticipatory coarticulation effects in alveolar contexts was observed for speakers with HI. Findings are interpreted on account of possible differences in articulation strategies between the two groups and with reference to current coarticulatory models.

RevDate: 2018-09-03

Kawitzky D, T McAllister (2018)

The Effect of Formant Biofeedback on the Feminization of Voice in Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30190-5 [Epub ahead of print].

Differences in formant frequencies between men and women contribute to the perception of voices as masculine or feminine. This study investigated whether visual-acoustic biofeedback can be used to help transgender women achieve formant targets typical of cisgender women, and whether such a shift influences the perceived femininity of speech. Transgender women and a comparison group of cisgender males were trained to produce vowels in a word context while also attempting to make a visual representation of their second formant (F2) line up with a target that was shifted up relative to their baseline F2 (feminized target) or an unshifted or shifted-down target (control conditions). Despite the short-term nature of the training, both groups showed significant differences in F2 frequency in shifted-up, shifted-down, and unshifted conditions. Gender typicality ratings from blinded listeners indicated that higher F2 values were associated with an increase in the perceived femininity of speech. Consistent with previous literature, we found that fundamental frequency and F2 make a joint contribution to the perception of gender. The results suggest that biofeedback might be a useful tool in voice modification therapy for transgender women; however, larger studies and information about generalization will be essential before strong conclusions can be drawn.

RevDate: 2018-08-08

Núñez-Batalla F, Vasile G, Cartón-Corona N, et al (2018)

Vowel production in hearing impaired children: A comparison between normal-hearing, hearing-aided and cochlear-implanted children.

Acta otorrinolaringologica espanola pii:S0001-6519(18)30117-1 [Epub ahead of print].

INTRODUCTION AND OBJECTIVES: Inadequate auditory feedback in prelingually deaf children alters the articulation of consonants and vowels. The purpose of this investigation was to compare vowel production in Spanish-speaking deaf children with cochlear implantation, and with hearing-aids with normal-hearing children by means of acoustic analysis of formant frequencies and vowel space.

METHODS: A total of 56 prelingually deaf children (25 with cochlear implants and 31 wearing hearing-aids) and 47 normal-hearing children participated. The first 2 formants (F1 and F2) of the five Spanish vowels were measured using Praat software. One-way analysis of variance (ANOVA) and post hoc Scheffé test were applied to analyze the differences between the 3 groups. The surface area of the vowel space was also calculated.

RESULTS: The mean value of F1 in all vowels was not significantly different between the 3 groups. For vowels /i/, /o/ and /u/, the mean value of F2 was significantly different between the 2 groups of deaf children and their normal-hearing peers.

CONCLUSION: Both prelingually hearing-impaired groups tended toward subtle deviations in the articulation of vowels that could be analyzed using an objective acoustic analysis programme.

RevDate: 2018-08-07

Bucci J, Perrier P, Gerber S, et al (2018)

Vowel Reduction in Coratino (South Italy): Phonological and Phonetic Perspectives.

Phonetica pii:000490947 [Epub ahead of print].

Vowel reduction may involve phonetic reduction processes, with nonreached targets, and/or phonological processes in which a vowel target is changed for another target, possibly schwa. Coratino, a dialect of southern Italy, displays complex vowel reduction processes assumed to be phonological. We analyzed a corpus representative of vowel reduction in Coratino, based on a set of a hundred pairs of words contrasting a stressed and an unstressed version of a given vowel in a given consonant environment, produced by 10 speakers. We report vowelformants together with consonant-to-vowel formant trajectories and durations, and show that these data are rather in agreement with a change in vowel target from /i e ɛ·ɔ u/ to schwa when the vowel is a non-word-initial unstressed utterance, unless the vowel shares a place-of-articulation feature with the preceding or following consonant. Interestingly, it also appears that there are 2 targets for phonological reduction, differing in F1 values. A "higher schwa" - which could be considered as /ɨ/ - corresponds to reduction for high vowels /i u/ while a "lower schwa" - which could be considered as /ə/ - corresponds to reduction for midhigh.

RevDate: 2018-08-04

Adriaans F (2018)

Effects of consonantal context on the learnability of vowel categories from infant-directed speech.

The Journal of the Acoustical Society of America, 144(1):EL20.

Recent studies have shown that vowels in infant-directed speech (IDS) are characterized by highly variable formant distributions. The current study investigates whether vowel variability is partially due to consonantal context, and explores whether consonantal context could support the learning of vowel categories from IDS. A computational model is presented which selects contexts based on frequency in the input and generalizes across contextual categories. Improved categorization performance was found on a vowel contrast in American-English IDS. The findings support a view in which the infant's learning mechanism is anchored in context, in order to cope with acoustic variability in the input.

RevDate: 2018-08-04

Barreda S, TM Nearey (2018)

A regression approach to vowel normalization for missing and unbalanced data.

The Journal of the Acoustical Society of America, 144(1):500.

Researchers investigating the vowel systems of languages or dialects frequently employ normalization methods to minimize between-speaker variability in formant patterns while preserving between-phoneme separation and (socio-)dialectal variation. Here two methods are considered: log-mean and Lobanov normalization. Although both of these methods express formants in a speaker-dependent space, the methods differ in their complexity and in their implied models of human vowel-perception. Typical implementations of these methods rely on balanced data across speakers so that researchers may have to reduce the data available in the analyses in missing-data situations. Here, an alternative method is proposed for the normalization of vowels using the log-mean method in a linear-regression framework. The performance of the traditional approaches to log-mean and Lobanov normalization against the regression approach to the log-mean method using naturalistic, simulated vowel-data was investigated. The results indicate that the Lobanov method likely removes legitimate linguistic variation from vowel data and often provides very noisy estimates of the actual vowel quality associated with individual tokens. The authors further argue that the Lobanov method is too complex to represent a plausible model of human vowel perception, and so is unlikely to provide results that reflect the true perceptual organization of linguistic data.

RevDate: 2018-08-04

Brajot FX, D Lawrence (2018)

Delay-induced low-frequency modulation of the voice during sustained phonation.

The Journal of the Acoustical Society of America, 144(1):282.

An important property of negative feedback systems is the tendency to oscillate when feedback is delayed. This paper evaluated this phenomenon in a sustained phonation task, where subjects prolonged a vowel with 0-600 ms delays in auditory feedback. This resulted in a delay-dependent vocal wow: from 0.4 to 1 Hz fluctuations in fundamental frequency and intensity that increased in period and amplitude as the delay increased. A similar modulation in low-frequency oscillations was not observed in the first two formant frequencies, although some subjects did display increased variability. Results suggest that delayed auditory feedback enhances an existing periodic fluctuation in the voice, with a more complex, possibly indirect, influence on supraglottal articulation. These findings have important implications for understanding how speech may be affected by artificially applied or disease-based delays in sensory feedback.

RevDate: 2018-11-14

Souza P, Wright R, Gallun F, et al (2018)

Reliability and Repeatability of the Speech Cue Profile.

Journal of speech, language, and hearing research : JSLHR, 61(8):2126-2137.

Purpose: Researchers have long noted speech recognition variability that is not explained by the pure-tone audiogram. Previous work (Souza, Wright, Blackburn, Tatman, & Gallun, 2015) demonstrated that a small number of listeners with sensorineural hearing loss utilized different types of acoustic cues to identify speechlike stimuli, specifically the extent to which the participant relied upon spectral (or temporal) information for identification. Consistent with recent calls for data rigor and reproducibility, the primary aims of this study were to replicate the pattern of cue use in a larger cohort and to verify stability of the cue profiles over time.

Method: Cue-use profiles were measured for adults with sensorineural hearing loss using a syllable identification task consisting of synthetic speechlike stimuli in which spectral and temporal dimensions were manipulated along continua. For the first set, a static spectral shape varied from alveolar to palatal, and a temporal envelope rise time varied from affricate to fricative. For the second set, formant transitions varied from labial to alveolar and a temporal envelope rise time varied from approximant to stop. A discriminant feature analysis was used to determine to what degree spectral and temporal information contributed to stimulus identification. A subset of participants completed a 2nd visit using the same stimuli and procedures.

Results: When spectral information was static, most participants were more influenced by spectral than by temporal information. When spectral information was dynamic, participants demonstrated a balanced distribution of cue-use patterns, with nearly equal numbers of individuals influenced by spectral or temporal cues. Individual cue profile was repeatable over a period of several months.

Conclusion: In combination with previously published data, these results indicate that listeners with sensorineural hearing loss are influenced by different cues to identify speechlike sounds and that those patterns are stable over time.

RevDate: 2018-07-28

Anikin A (2018)

Soundgen: An open-source tool for synthesizing nonverbal vocalizations.

Behavior research methods pii:10.3758/s13428-018-1095-7 [Epub ahead of print].

Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen (https://CRAN.R-project.org/package=soundgen) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

RevDate: 2018-09-24

Hînganu MV, Hînganu D, Cozma SR, et al (2018)

Morphofunctional evaluation of buccopharyngeal space using three-dimensional cone-beam computed tomography (3D-CBCT).

Annals of anatomy = Anatomischer Anzeiger : official organ of the Anatomische Gesellschaft, 220:1-8.

The present study aims to identify the anatomical functional changes of the buccopharyngeal space in case of singers with canto voice. The interest in this field is particularly important in view of the relation between the artistic performance level, phoniatry and functional anatomy, as the voice formation mechanism is not completely known yet. We conducted a morphometric study on three soprano voices that differ in type and training level. The anatomical soft structures from the superior vocal formant of each soprano were measured on images captured using the Cone-beam Computed Tomography (CBCT) technique. The results obtained, as well as the 3D reconstructions emphasize the particularities of the individual morphological features, especially in case of the experienced soprano soloist, which are found to be different for each anatomical soft structure, as well as for their integrity. The experimental results are encouraging and suggest further development of this study on soprano voices and also on other types of opera voices.

RevDate: 2018-11-14

Whalen DH, Chen WR, Tiede MK, et al (2018)

Variability of articulator positions and formants across nine English vowels.

Journal of phonetics, 68:1-14.

Speech, though communicative, is quite variable both in articulation and acoustics, and it has often been claimed that articulation is more variable. Here we compared variability in articulation and acoustics for 32 speakers in the x-ray microbeam database (XRMB; Westbury, 1994). Variability in tongue, lip and jaw positions for nine English vowels (/u, ʊ, æ, ɑ, ʌ, ɔ, ε, ɪ, i/) was compared to that of the corresponding formant values. The domains were made comparable by creating three-dimensional spaces for each: the first three principal components from an analysis of a 14-dimensional space for articulation, and an F1xF2xF3 space for acoustics. More variability occurred in the articulation than the acoustics for half of the speakers, while the reverse was true for the other half. Individual tokens were further from the articulatory median than the acoustic median for 40-60% of tokens across speakers. A separate analysis of three non-low front vowels (/ε, ɪ, i/, for which the XRMB system provides the most direct articulatory evidence) did not differ from the omnibus analysis. Speakers tended to be either more or less variable consistently across vowels. Across speakers, there was a positive correlation between articulatory and acoustic variability, both for all vowels and for just the three non-low front vowels. Although the XRMB is an incomplete representation of articulation, it nonetheless provides data for direct comparisons between articulatory and acoustic variability that have not been reported previously. The results indicate that articulation is not more variable than acoustics, that speakers had relatively consistent variability across vowels, and that articulatory and acoustic variability were related for the vowels themselves.

RevDate: 2018-08-09

Barakzai SZ, Wells J, Parkin TDH, et al (2018)

Overground endoscopic findings and respiratory sound analysis in horses with recurrent laryngeal neuropathy after unilateral laser ventriculocordectomy.

Equine veterinary journal [Epub ahead of print].

BACKGROUND: Unilateral ventriculocordectomy (VeC) is frequently performed, yet objective studies in horses with naturally occurring recurrent laryngeal neuropathy (RLN) are few.

OBJECTIVES: To evaluate respiratory noise and exercising overground endoscopy in horses with grade B and C laryngeal function, before and after unilateral laser VeC.

STUDY DESIGN: Prospective study in clinically affected client-owned horses.

METHODS: Exercising endoscopy was performed and concurrent respiratory noise was recorded. A left-sided laser VeC was performed under standing sedation. Owners were asked to present the horse for re-examination 6-8 weeks post-operatively when exercising endoscopy and sound recordings were repeated. Exercising endoscopic findings were recorded, including the degree of arytenoid stability. Quantitative measurement of left-to-right quotient angle ratio (LRQ) and rima glottidis area ratio (RGA) were performed pre- and post-operatively. Sound analysis was performed, and measurements of the energy change in F1, F2 and F3 formants between pre- and post-operative recordings were made and statistically analysed.

RESULTS: Three grade B and seven grade C horses were included; 6/7grade C horses preoperatively had bilateral vocal fold collapse (VFC) and 5/7 had mild right-sided medial deviation of the ary-epiglottic fold (MDAF). Right VFC and MDAF was still present in these horses post-operatively; grade B horses had no other endoscopic dynamic abnormalities post-operatively. Sound analysis showed significant reduction in energy in formant F2 (P = 0.05) after surgery.

MAIN LIMITATIONS: The study sample size was small and multiple dynamic abnormalities made sound analysis challenging.

CONCLUSIONS: RLN-affected horses have reduction in sound levels in F2 after unilateral laser VeC. Continuing noise may be caused by other ongoing forms of dynamic obstruction in grade C horses. Unilateral VeC is useful for grade B horses based on endoscopic images. In grade C horses, bilateral VeC, right ary-epiglottic fold resection ± laryngoplasty might be a better option than unilateral VeC alone. The Summary is available in Portuguese - see Supporting Information.

RevDate: 2018-11-14

Buzaneli ECP, Zenari MS, Kulcsar MAV, et al (2018)

Supracricoid Laryngectomy: The Function of the Remaining Arytenoid in Voice and Swallowing.

International archives of otorhinolaryngology, 22(3):303-312.

Introduction Supracricoid laryngectomy still has selected indications; there are few studies in the literature, and the case series are limited, a fact that stimulates the development of new studies to further elucidate the structural and functional aspects of the procedure. Objective To assess voice and deglutition parameters according to the number of preserved arytenoids. Methods Eleven patients who underwent subtotal laryngectomy with cricohyoidoepiglottopexy were evaluated by laryngeal nasofibroscopy, videofluoroscopy, and auditory-perceptual, acoustic, and voice pleasantness analyses, after resuming oral feeding. Results Functional abnormalities were detected in two out of the three patients who underwent arytenoidectomy, and in six patients from the remainder of the sample. Almost half of the sample presented silent laryngeal penetration and/or vallecular/hypopharyngeal stasis on the videofluoroscopy. The mean voice analysis scores indicated moderate vocal deviation, roughness and breathiness; severe strain and loudness deviation; shorter maximum phonation time; the presence of noise; and high third and fourth formant values. The voices were rated as unpleasant. There was no difference in the number and functionality of the remaining arytenoids as prognostic factors for deglutition; however, in the qualitative analysis, favorable voice and deglutition outcomes were more common among patients who did not undergo arytenoidectomy and had normal functional conditions. Conclusion The number and functionality of the preserved arytenoids were not found to be prognostic factors for favorable deglutition efficiency outcomes. However, the qualitative analysis showed that the preservation of both arytenoids and the absence of functional abnormalities were associated with more satisfactory voice and deglutition patterns.

RevDate: 2018-07-01

El Boghdady N, Başkent D, E Gaudrain (2018)

Effect of frequency mismatch and band partitioning on vocal tract length perception in vocoder simulations of cochlear implant processing.

The Journal of the Acoustical Society of America, 143(6):3505.

The vocal tract length (VTL) of a speaker is an important voice cue that aids speech intelligibility in multi-talker situations. However, cochlear implant (CI) users demonstrate poor VTL sensitivity. This may be partially caused by the mismatch between frequencies received by the implant and those corresponding to places of stimulation along the cochlea. This mismatch can distort formant spacing, where VTL cues are encoded. In this study, the effects of frequency mismatch and band partitioning on VTL sensitivity were investigated in normal hearing listeners with vocoder simulations of CI processing. The hypotheses were that VTL sensitivity may be reduced by increased frequency mismatch and insufficient spectral resolution in how the frequency range is partitioned, specifically where formants lie. Moreover, optimal band partitioning might mitigate the detrimental effects of frequency mismatch on VTL sensitivity. Results showed that VTL sensitivity decreased with increased frequency mismatch and reduced spectral resolution near the low frequencies of the band partitioning map. Band partitioning was independent of mismatch, indicating that if a given partitioning is suboptimal, a better partitioning might improve VTL sensitivity despite the degree of mismatch. These findings suggest that customizing the frequency partitioning map may enhance VTL perception in individual CI users.

RevDate: 2018-07-01

Vikram CM, Macha SK, Kalita S, et al (2018)

Acoustic analysis of misarticulated trills in cleft lip and palate children.

The Journal of the Acoustical Society of America, 143(6):EL474.

In this paper, acoustic analysis of misarticulated trills in cleft lip and palate speakers is carried out using excitation source based features: strength of excitation and fundamental frequency, derived from zero-frequency filtered signal, and vocal tract system features: first formant frequency (F1) and trill frequency, derived from the linear prediction analysis and autocorrelation approach, respectively. These features are found to be statistically significant while discriminating normal from misarticulated trills. Using acoustic features, dynamic time warping based trill misarticulation detection system is demonstrated. The performance of the proposed system in terms of the F1-score is 73.44%, whereas that for conventional Mel-frequency cepstral coefficients is 66.11%.

RevDate: 2018-07-17

Ng ML, Yan N, Chan V, et al (2018)

A Volumetric Analysis of the Vocal Tract Associated with Laryngectomees Using Acoustic Reflection Technology.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 70(1):44-49.

OBJECTIVE: Previous studies of the laryngectomized vocal tract using formant frequencies reported contradictory findings. Imagining studies of the vocal tract in alaryngeal speakers are limited due to the possible radiation effect as well as the cost and time associated with the studies. The present study examined the vocal tract configuration of laryngectomized individuals using acoustic reflection technology.

SUBJECTS AND METHODS: Thirty alaryngeal and 30 laryngeal male speakers of Cantonese participated in the study. A pharyngometer was used to obtain volumetric information of the vocal tract. All speakers were instructed to imitate the production of /a/ when the length and volume information of the oral cavity, pharyngeal cavity, and the entire vocal tract were obtained. The data of alaryngeal and laryngeal speakers were compared.

RESULTS: Pharyngometric measurements revealed no significant difference in the vocal tract dimensions between laryngeal and alaryngeal speakers.

CONCLUSION: Despite the removal of the larynx and a possible alteration in the pharyngeal cavity during total laryngectomy, the vocal tract configuration (length and volume) in laryngectomized individuals was not significantly different from laryngeal speakers. It is suggested that other factors might have affected formant measures in previous studies.

RevDate: 2018-09-12

Reby D, Wyman MT, Frey R, et al (2018)

Vocal tract modelling in fallow deer: are male groans nasalized?.

The Journal of experimental biology, 221(Pt 17): pii:jeb.179416.

Males of several species of deer have a descended and mobile larynx, resulting in an unusually long vocal tract, which can be further extended by lowering the larynx during call production. Formant frequencies are lowered as the vocal tract is extended, as predicted when approximating the vocal tract as a uniform quarter wavelength resonator. However, formant frequencies in polygynous deer follow uneven distribution patterns, indicating that the vocal tract configuration may in fact be rather complex. We CT-scanned the head and neck region of two adult male fallow deer specimens with artificially extended vocal tracts and measured the cross-sectional areas of the supra-laryngeal vocal tract along the oral and nasal tracts. The CT data were then used to predict the resonances produced by three possible configurations, including the oral vocal tract only, the nasal vocal tract only, or combining the two. We found that the area functions from the combined oral and nasal vocal tracts produced resonances more closely matching the formant pattern and scaling observed in fallow deer groans than those predicted by the area functions of the oral vocal tract only or of the nasal vocal tract only. This indicates that the nasal and oral vocal tracts are both simultaneously involved in the production of a non-human mammal vocalization, and suggests that the potential for nasalization in putative oral loud calls should be carefully considered.

RevDate: 2018-11-14

Yilmaz A, Sarac ET, Aydinli FE, et al (2018)

Investigating the effect of STN-DBS stimulation and different frequency settings on the acoustic-articulatory features of vowels.

Neurological sciences : official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology pii:10.1007/s10072-018-3479-y [Epub ahead of print].

INTRODUCTION: Parkinson's disease (PD) is the second most frequent progressive neuro-degenerative disorder. In addition to motor symptoms, nonmotor symptoms and voice and speech disorders can also develop in 90% of PD patients. The aim of our study was to investigate the effects of DBS and different DBS frequencies on speech acoustics of vowels in PD patients.

METHODS: The study included 16 patients who underwent STN-DBS surgery due to PD. The voice recordings for the vowels including [a], [e], [i], and [o] were performed at frequencies including 230, 130, 90, and 60 Hz and off-stimulation. The voice recordings were gathered and evaluated by the Praat software, and the effects on the first (F1), second (F2), and third formant (F3) frequencies were analyzed.

RESULTS: A significant difference was found for the F1 value of the vowel [a] at 130 Hz compared to off-stimulation. However, no significant difference was found between the three formant frequencies with regard to the stimulation frequencies and off-stimulation. In addition, though not statistically significant, stimulation at 60 and 230 Hz led to several differences in the formant frequencies of other three vowels.

CONCLUSION: Our results indicated that STN-DBS stimulation at 130 Hz had a significant positive effect on articulation of [a] compared to off-stimulation. Although there is not any statistical significant stimulation at 60 and 230 Hz may also have an effect on the articulation of [e], [i], and [o] but this effect needs to be investigated in future studies with higher numbers of participants.

RevDate: 2018-11-14

Dietrich S, Hertrich I, Müller-Dahlhaus F, et al (2018)

Reduced Performance During a Sentence Repetition Task by Continuous Theta-Burst Magnetic Stimulation of the Pre-supplementary Motor Area.

Frontiers in neuroscience, 12:361.

The pre-supplementary motor area (pre-SMA) is engaged in speech comprehension under difficult circumstances such as poor acoustic signal quality or time-critical conditions. Previous studies found that left pre-SMA is activated when subjects listen to accelerated speech. Here, the functional role of pre-SMA was tested for accelerated speech comprehension by inducing a transient "virtual lesion" using continuous theta-burst stimulation (cTBS). Participants were tested (1) prior to (pre-baseline), (2) 10 min after (test condition for the cTBS effect), and (3) 60 min after stimulation (post-baseline) using a sentence repetition task (formant-synthesized at rates of 8, 10, 12, 14, and 16 syllables/s). Speech comprehension was quantified by the percentage of correctly reproduced speech material. For high speech rates, subjects showed decreased performance after cTBS of pre-SMA. Regarding the error pattern, the number of incorrect words without any semantic or phonological similarity to the target context increased, while related words decreased. Thus, the transient impairment of pre-SMA seems to affect its inhibitory function that normally eliminates erroneous speech material prior to speaking or, in case of perception, prior to encoding into a semantically/pragmatically meaningful message.

RevDate: 2018-11-14

Kent RD, HK Vorperian (2018)

Static measurements of vowel formant frequencies and bandwidths: A review.

Journal of communication disorders, 74:74-97.

PURPOSE: Data on vowel formants have been derived primarily from static measures representing an assumed steady state. This review summarizes data on formant frequencies and bandwidths for American English and also addresses (a) sources of variability (focusing on speech sample and time sampling point), and (b) methods of data reduction such as vowel area and dispersion.

METHOD: Searches were conducted with CINAHL, Google Scholar, MEDLINE/PubMed, SCOPUS, and other online sources including legacy articles and references. The primary search items were vowels, vowel space area, vowel dispersion, formants, formant frequency, and formant bandwidth.

RESULTS: Data on formant frequencies and bandwidths are available for both sexes over the lifespan, but considerable variability in results across studies affects even features of the basic vowel quadrilateral. Origins of variability likely include differences in speech sample and time sampling point. The data reveal the emergence of sex differences by 4 years of age, maturational reductions in formant bandwidth, and decreased formant frequencies with advancing age in some persons. It appears that a combination of methods of data reduction provide for optimal data interpretation.

CONCLUSION: The lifespan database on vowel formants shows considerable variability within specific age-sex groups, pointing to the need for standardized procedures.

RevDate: 2018-06-09

Horáček J, Radolf V, AM Laukkanen (2018)

Impact Stress in Water Resistance Voice Therapy: A Physical Modeling Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(17)30463-0 [Epub ahead of print].

OBJECTIVES: Phonation through a tube in water is used in voice therapy. This study investigates whether this exercise may increase mechanical loading on the vocal folds.

STUDY DESIGN: This is an experimental modeling study.

METHODS: A model with three-layer silicone vocal fold replica and a plexiglass, MK Plexi, Prague vocal tract set for the articulation of vowel [u:] was used. Impact stress (IS) was measured in three conditions: for [u:] (1) without a tube, (2) with a silicon Lax Vox tube (35 cm in length, 1 cm in inner diameter) immersed 2 cm in water, and (3) with the tube immersed 10 cm in water. Subglottic pressure and airflow ranges were selected to correspond to those reported in normal human phonation.

RESULTS: Phonation threshold pressure was lower for phonation into water compared with [u:] without a tube. IS increased with the airflow rate. IS measured in the range of subglottic pressure, which corresponds to measurements in humans, was highest for vowel [u:] without a tube and lower with the tube in water.

CONCLUSIONS: Even though the model and humans cannot be directly compared, for instance due to differences in vocal tract wall properties, the results suggest that IS is not likely to increase harmfully in water resistance therapy. However, there may be other effects related to it, possibly causing symptoms of vocal fatigue (eg, increased activity in the adductors or high amplitudes of oral pressure variation probably capable of increasing stress in the vocal fold). These need to be studied further, especially for cases where the water bubbling frequency is close to the acoustical-mechanical resonance and at the same time the fundamental phonation frequency is near the first formant frequency of the system.

RevDate: 2018-07-17

Bauerly KR (2018)

The Effects of Emotion on Second Formant Frequency Fluctuations in Adults Who Stutter.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 70(1):13-23.

OBJECTIVE: Changes in second formant frequency fluctuations (FFF2) were examined in adults who stutter (AWS) and adults who do not stutter (ANS) when producing nonwords under varying emotional conditions.

METHODS: Ten AWS and 10 ANS viewed images selected from the International Affective Picture System representing dimensions of arousal (e.g., excited versus bored) and hedonic valence (e.g., happy versus sad). Immediately following picture presentation, participants produced a consonant-vowel + final /t/ (CVt) nonword consisting of the initial sounds /p/, /b/, /s/, or /z/, followed by a vowel (/i/, /u/, /ε/) and a final /t/. CVt tokens were assessed for word duration and FFF2.

RESULTS: Significantly slower word durations were shown in the AWS compared to the ANS across conditions. Although these differences appeared to increase under arousing conditions, no interaction was found. Results for FFF2 revealed a significant group-condition interaction. Post hoc analysis indicated that this was due to the AWS showing significantly greater FFF2 when speaking under conditions eliciting increases in arousal and unpleasantness. ANS showed little change in FFF2 across conditions.

CONCLUSIONS: The results suggest that AWS' articulatory stability is more susceptible to breakdown under negative emotional influences.

RevDate: 2018-11-14

Fisher JM, Dick FK, Levy DF, et al (2018)

Neural representation of vowel formants in tonotopic auditory cortex.

NeuroImage, 178:574-582.

Speech sounds are encoded by distributed patterns of activity in bilateral superior temporal cortex. However, it is unclear whether speech sounds are topographically represented in cortex, or which acoustic or phonetic dimensions might be spatially mapped. Here, using functional MRI, we investigated the potential spatial representation of vowels, which are largely distinguished from one another by the frequencies of their first and second formants, i.e. peaks in their frequency spectra. This allowed us to generate clear hypotheses about the representation of specific vowels in tonotopic regions of auditory cortex. We scanned participants as they listened to multiple natural tokens of the vowels [ɑ] and [i], which we selected because their first and second formants overlap minimally. Formant-based regions of interest were defined for each vowel based on spectral analysis of the vowel stimuli and independently acquired tonotopic maps for each participant. We found that perception of [ɑ] and [i] yielded differential activation of tonotopic regions corresponding to formants of [ɑ] and [i], such that each vowel was associated with increased signal in tonotopic regions corresponding to its own formants. This pattern was observed in Heschl's gyrus and the superior temporal gyrus, in both hemispheres, and for both the first and second formants. Using linear discriminant analysis of mean signal change in formant-based regions of interest, the identity of untrained vowels was predicted with ∼73% accuracy. Our findings show that cortical encoding of vowels is scaffolded on tonotopy, a fundamental organizing principle of auditory cortex that is not language-specific.

RevDate: 2018-06-02

Dubey AK, Tripathi A, Prasanna SRM, et al (2018)

Detection of hypernasality based on vowel space area.

The Journal of the Acoustical Society of America, 143(5):EL412.

This study proposes a method for differentiating hypernasal-speech from normal speech using the vowel space area (VSA). Hypernasality introduces extra formant and anti-formant pairs in vowel spectrum, which results in shifting of formants. This shifting affects the size of the VSA. The results show that VSA is reduced in hypernasal-speech compared to normal speech. The VSA feature plus Mel-frequency cepstral coefficient feature for support vector machine based hypernasality detection leads to an accuracy of 86.89% for sustained vowels and 89.47%, 90.57%, and 91.70% for vowels in contexts of high pressure consonants /k/, /p/, and /t/, respectively.

RevDate: 2018-11-14

Story BH, Vorperian HK, Bunton K, et al (2018)

An age-dependent vocal tract model for males and females based on anatomic measurements.

The Journal of the Acoustical Society of America, 143(5):3079.

The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.

RevDate: 2018-06-02

Carignan C (2018)

Using ultrasound and nasalance to separate oral and nasal contributions to formant frequencies of nasalized vowels.

The Journal of the Acoustical Society of America, 143(5):2588.

The experimental method described in this manuscript offers a possible means to address a well known issue in research on the independent effects of nasalization on vowel acoustics: given that the separate transfer functions associated with the oral and nasal cavities are merged in the acoustic signal, the task of teasing apart the respective effects of the two cavities seems to be an intractable problem. The proposed method uses ultrasound and nasalance to predict the effect of lingual configuration on formant frequencies of nasalized vowels, thus accounting for acoustic variation due to changing lingual posture and excluding its contribution to the acoustic signal. The results reveal that the independent effect of nasalization on the acoustic vowel quadrilateral resembles a counter-clockwise chain shift of nasal compared to non-nasal vowels. The results from the productions of 11 vowels by six speakers of different language backgrounds are compared to predictions presented in previous modeling studies, as well as discussed in the light of sound change of nasal vowel systems.

RevDate: 2018-11-26

Romanelli S, Menegotto A, R Smyth (2018)

Stress-Induced Acoustic Variation in L2 and L1 Spanish Vowels.

Phonetica, 75(3):190-218.

AIM: We assessed the effect of lexical stress on the duration and quality of Spanish word-final vowels /a, e, o/ produced by American English late intermediate learners of L2 Spanish, as compared to those of native L1 Argentine Spanish speakers.

METHODS: Participants read 54 real words ending in /a, e, o/, with either final or penultimate lexical stress, embedded in a text and a word list. We measured vowel duration and both F1 and F2 frequencies at 3 temporal points.

RESULTS: stressed vowels were longer than unstressed vowels, in Spanish L1 and L2. L1 and L2 Spanish stressed /a/ and /e/ had higher F1 values than their unstressed counterparts. Only the L2 speakers showed evidence of rising offglides for /e/ and /o/. The L2 and L1 Spanish vowel space was compressed in the absence of stress.

CONCLUSION: Lexical stress affected the vowel quality of L1 and L2 Spanish vowels. We provide an up-to-date account of the formant trajectories of Argentine River Plate Spanish word-final /a, e, o/ and offer experimental support to the claim that stress affects the quality of Spanish vowels in word-final contexts.

RevDate: 2018-05-25

Peter V, Kalashnikova M, D Burnham (2018)

Weighting of Amplitude and Formant Rise Time Cues by School-Aged Children: A Mismatch Negativity Study.

Journal of speech, language, and hearing research : JSLHR, 61(5):1322-1333.

Purpose: An important skill in the development of speech perception is to apply optimal weights to acoustic cues so that phonemic information is recovered from speech with minimum effort. Here, we investigated the development of acoustic cue weighting of amplitude rise time (ART) and formant rise time (FRT) cues in children as measured by mismatch negativity (MMN).

Method: Twelve adults and 36 children aged 6-12 years listened to a /ba/-/wa/ contrast in an oddball paradigm in which the standard stimulus had the ART and FRT cues of /ba/. In different blocks, the deviant stimulus had either the ART or FRT cues of /wa/.

Results: The results revealed that children younger than 10 years were sensitive to both ART and FRT cues whereas 10- to 12-year-old children and adults were sensitive only to FRT cues. Moreover, children younger than 10 years generated a positive mismatch response, whereas older children and adults generated MMN.

Conclusion: These results suggest that preattentive adultlike weighting of ART and FRT cues is attained only by 10 years of age and accompanies the change from mismatch response to the more mature MMN response.

Supplemental Material: https://doi.org/10.23641/asha.6207608.

RevDate: 2018-11-14

Redford MA (2018)

Grammatical Word Production Across Metrical Contexts in School-Aged Children's and Adults' Speech.

Journal of speech, language, and hearing research : JSLHR, 61(6):1339-1354.

Purpose: The purpose of this study is to test whether age-related differences in grammatical word production are due to differences in how children and adults chunk speech for output or to immature articulatory timing control in children.

Method: Two groups of 12 children, 5 and 8 years old, and 1 group of 12 adults produced sentences with phrase-medial determiners. Preceding verbs were varied to create different metrical contexts for chunking the determiner with an adjacent content word. Following noun onsets were varied to assess the coherence of determiner-noun sequences. Determiner vowel duration, amplitude, and formant frequencies were measured.

Results: Children produced significantly longer and louder determiners than adults regardless of metrical context. The effect of noun onset on F1 was stronger in children's speech than in adults' speech; the effect of noun onset on F2 was stronger in adults' speech than in children's. Effects of metrical context on anticipatory formant patterns were more evident in children's speech than in adults' speech.

Conclusion: The results suggest that both immature articulatory timing control and age-related differences in how chunks are accessed or planned influence grammatical word production in school-aged children's speech. Future work will focus on the development of long-distance coarticulation to reveal the evolution of speech plan structure over time.

RevDate: 2018-05-24

Dugan SH, Silbert N, McAllister T, et al (2018)

Modelling category goodness judgments in children with residual sound errors.

Clinical linguistics & phonetics [Epub ahead of print].

This study investigates category goodness judgments of /r/ in adults and children with and without residual speech errors (RSEs) using natural speech stimuli. Thirty adults, 38 children with RSE (ages 7-16) and 35 age-matched typically developing (TD) children provided category goodness judgments on whole words, recorded from 27 child speakers, with /r/ in various phonetic environments. The salient acoustic property of /r/ - the lowered third formant (F3) - was normalized in two ways. A logistic mixed-effect model quantified the relationships between listeners' responses and the third formant frequency, vowel context and clinical group status. Goodness judgments from the adult group showed a statistically significant interaction with the F3 parameter when compared to both child groups (p < 0.001) using both normalization methods. The RSE group did not differ significantly from the TD group in judgments of /r/. All listeners were significantly more likely to judge /r/ as correct in a front-vowel context. Our results suggest that normalized /r/ F3 is a statistically significant predictor of category goodness judgments for both adults and children, but children do not appear to make adult-like judgments. Category goodness judgments do not have a clear relationship with /r/ production abilities in children with RSE. These findings may have implications for clinical activities that include category goodness judgments in natural speech, especially for recorded productions.

RevDate: 2018-11-14
CmpDate: 2018-08-20

Tai HC, Shen YP, Lin JH, et al (2018)

Acoustic evolution of old Italian violins from Amati to Stradivari.

Proceedings of the National Academy of Sciences of the United States of America, 115(23):5926-5931.

The shape and design of the modern violin are largely influenced by two makers from Cremona, Italy: The instrument was invented by Andrea Amati and then improved by Antonio Stradivari. Although the construction methods of Amati and Stradivari have been carefully examined, the underlying acoustic qualities which contribute to their popularity are little understood. According to Geminiani, a Baroque violinist, the ideal violin tone should "rival the most perfect human voice." To investigate whether Amati and Stradivari violins produce voice-like features, we recorded the scales of 15 antique Italian violins as well as male and female singers. The frequency response curves are similar between the Andrea Amati violin and human singers, up to ∼4.2 kHz. By linear predictive coding analyses, the first two formants of the Amati exhibit vowel-like qualities (F1/F2 = 503/1,583 Hz), mapping to the central region on the vowel diagram. Its third and fourth formants (F3/F4 = 2,602/3,731 Hz) resemble those produced by male singers. Using F1 to F4 values to estimate the corresponding vocal tract length, we observed that antique Italian violins generally resemble basses/baritones, but Stradivari violins are closer to tenors/altos. Furthermore, the vowel qualities of Stradivari violins show reduced backness and height. The unique formant properties displayed by Stradivari violins may represent the acoustic correlate of their distinctive brilliance perceived by musicians. Our data demonstrate that the pioneering designs of Cremonese violins exhibit voice-like qualities in their acoustic output.

RevDate: 2018-05-21

Niemczak CE, KR Vander Werff (2018)

Informational Masking Effects on Neural Encoding of Stimulus Onset and Acoustic Change.

Ear and hearing [Epub ahead of print].

OBJECTIVE: Recent investigations using cortical auditory evoked potentials have shown masker-dependent effects on sensory cortical processing of speech information. Background noise maskers consisting of other people talking are particularly difficult for speech recognition. Behavioral studies have related this to perceptual masking, or informational masking, beyond just the overlap of the masker and target at the auditory periphery. The aim of the present study was to use cortical auditory evoked potentials, to examine how maskers (i.e., continuous speech-shaped noise [SSN] and multi-talker babble) affect the cortical sensory encoding of speech information at an obligatory level of processing. Specifically, cortical responses to vowel onset and formant change were recorded under different background noise conditions presumed to represent varying amounts of energetic or informational masking. The hypothesis was, that even at this obligatory cortical level of sensory processing, we would observe larger effects on the amplitude and latency of the onset and change components as the amount of informational masking increased across background noise conditions.

DESIGN: Onset and change responses were recorded to a vowel change from /u-i/ in young adults under four conditions: quiet, continuous SSN, eight-talker (8T) babble, and two-talker (2T) babble. Repeated measures analyses by noise condition were conducted on amplitude, latency, and response area measurements to determine the differential effects of these noise conditions, designed to represent increasing and varying levels of informational and energetic masking, on cortical neural representation of a vowel onset and acoustic change response waveforms.

RESULTS: All noise conditions significantly reduced onset N1 and P2 amplitudes, onset N1-P2 peak to peak amplitudes, as well as both onset and change response area compared with quiet conditions. Further, all amplitude and area measures were significantly reduced for the two babble conditions compared with continuous SSN. However, there were no significant differences in peak amplitude or area for either onset or change responses between the two different babble conditions (eight versus two talkers). Mean latencies for all onset peaks were delayed for noise conditions compared with quiet. However, in contrast to the amplitude and area results, differences in peak latency between SSN and the babble conditions did not reach statistical significance.

CONCLUSIONS: These results support the idea that while background noise maskers generally reduce amplitude and increase latency of speech-sound evoked cortical responses, the type of masking has a significant influence. Speech babble maskers (eight talkers and two talkers) have a larger effect on the obligatory cortical response to speech sound onset and change compared with purely energetic continuous SSN maskers, which may be attributed to informational masking effects. Neither the neural responses to the onset nor the vowel change, however, were sensitive to the hypothesized increase in the amount of informational masking between speech babble maskers with two talkers compared with eight talkers.

RevDate: 2018-11-02

Sóskuthy M, Foulkes P, Hughes V, et al (2018)

Changing Words and Sounds: The Roles of Different Cognitive Units in Sound Change.

Topics in cognitive science, 10(4):787-802.

This study considers the role of different cognitive units in sound change: phonemes, contextual variants and words. We examine /u/-fronting and /j/-dropping in data from three generations of Derby English speakers. We analyze dynamic formant data and auditory judgments, using mixed effects regression methods, including generalized additive mixed models (GAMMs). /u/-fronting is reaching its end-point, showing complex conditioning by context and a frequency effect that weakens over time. /j/-dropping is declining, with low-frequency words showing more innovative variants with /j/ than high-frequency words. The two processes interact: words with variable /j/-dropping (new) exhibit more fronting than words that never have /j/ (noodle) even when the /j/ is deleted. These results support models of change that rely on phonetically detailed representations for both word- and sound-level cognitive units.

RevDate: 2018-05-16

Sanfins MD, Hatzopoulos S, Donadon C, et al (2018)

An Analysis of The Parameters Used In Speech ABR Assessment Protocols.

The journal of international advanced otology, 14(1):100-105.

The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.

RevDate: 2018-05-15

Chen Y, Wang J, Chen W, et al (2017)

[Research on spectrum feature of speech processing strategy for cochlear implant].

Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi, 34(5):760-766.

Cochlear implant (CI) in present Chinese environment will lose pitch information and result in low speech recognition. In order to research Chinese feature-based speech processing strategy for cochlear implant contrapuntally and to improve the speech recognition for CI recipients, we improve the CI front-end signal acquisition platform and research the signal features. Our search includes the waveform, spectrogram, energy intensity, pitch and formant parameters for different speech processing strategies of cochlear implant. Features in two kinds of speech processing strategies are analyzed and extracted for the study of parameter characteristics. Therefore, the proposed aim of this paper is to extend the research on Chinese-based CI speech processing strategy.

RevDate: 2018-05-10

Wang Q, Bai J, Xue P, et al (2018)

[An acoustic-articulatory study of the nasal finals in students with and without hearing loss].

Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi, 35(2):198-205.

The central aim of this experiment was to compare the articulatory and acoustic characteristics of students with normal hearing (NH) and school aged children with hearing loss (HL), and to explore the articulatory-acoustic relations during the nasal finals. Fourteen HL and 10 control group were enrolled in this study, and the data of 4 HL students were removed because of their high pronunciation error rate. Data were collected using an electromagnetic articulography. The acoustic data and kinematics data of nasal finals were extracted by the phonetics and data processing software, and all data were analyzed by t test and correlation analysis. The paper shows that, the difference was statistically significant (P<0.05 or P<0.01) in different vowels under the first two formant frequencies (F1, F2), the tongue position and the articulatory-acoustic relations between HL and NH group. The HL group's vertical movement data-F1 relations in /en/ and /eng/ are same as NH group. The conclusion of this study about participants with HL can provide support for speech healing training at increasing pronunciation accuracy in HL participants.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Order from Amazon

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

21454 NE 143rd Street
Woodinville, WA 98077


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )