About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

23 Mar 2019 at 01:40
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 23 Mar 2019 at 01:40 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

RevDate: 2019-03-19

Stilp CE, AA Assgari (2019)

Natural speech statistics shift phoneme categorization.

Attention, perception & psychophysics pii:10.3758/s13414-018-01659-3 [Epub ahead of print].

All perception takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., more energy at lower first formant [F1] frequencies) differs from that of a later (target) sound (e.g., vowel with intermediate F1), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F1 /ɛ/). Historically, these studies used filters to force context sounds to possess desired spectral compositions. This approach is agnostic to the natural signal statistics of speech (inherent spectral compositions without any additional manipulations). The auditory system is thought to be attuned to such stimulus statistics, but this has gone untested. Here, vowel categorization was measured following unfiltered (already possessing the desired spectral composition) or filtered sentences (to match spectral characteristics of unfiltered sentences). Vowel categorization was biased in both cases, with larger biases as the spectral prominences in context sentences increased. This confirms sensitivity to natural signal statistics, extending spectral context effects in speech perception to more naturalistic listening conditions. Importantly, categorization biases were smaller and more variable following unfiltered sentences, raising important questions about how faithfully experiments using filtered contexts model everyday speech perception.

RevDate: 2019-03-11

Rodrigues S, Martins F, Silva S, et al (2019)

/l/ velarisation as a continuum.

PloS one, 14(3):e0213392 pii:PONE-D-18-30510.

In this paper, we present a production study to explore the controversial question about /l/ velarisation. Measurements of first (F1), second (F2) and third (F3) formant frequencies and the slope of F2 were analysed to clarify the /l/ velarisation behaviour in European Portuguese (EP). The acoustic data were collected from ten EP speakers, producing trisyllabic words with paroxytone stress pattern, with the liquid consonant at the middle of the word in onset, complex onset and coda positions. Results suggested that /l/ is produced on a continuum in EP. The consistently low F2 indicates that /l/ is velarised in all syllable positions, but variation especially in F1 and F3 revealed that /l/ could be "more velarised" or "less velarised" dependent on syllable positions and vowel contexts. These findings suggest that it is important to consider different acoustic measures to better understand /l/ velarisation in EP.

RevDate: 2019-03-06

Rampinini AC, Handjaras G, Leo A, et al (2019)

Formant Space Reconstruction From Brain Activity in Frontal and Temporal Regions Coding for Heard Vowels.

Frontiers in human neuroscience, 13:32.

Classical studies have isolated a distributed network of temporal and frontal areas engaged in the neural representation of speech perception and production. With modern literature arguing against unique roles for these cortical regions, different theories have favored either neural code-sharing or cortical space-sharing, thus trying to explain the intertwined spatial and functional organization of motor and acoustic components across the fronto-temporal cortical network. In this context, the focus of attention has recently shifted toward specific model fitting, aimed at motor and/or acoustic space reconstruction in brain activity within the language network. Here, we tested a model based on acoustic properties (formants), and one based on motor properties (articulation parameters), where model-free decoding of evoked fMRI activity during perception, imagery, and production of vowels had been successful. Results revealed that phonological information organizes around formant structure during the perception of vowels; interestingly, such a model was reconstructed in a broad temporal region, outside of the primary auditory cortex, but also in the pars triangularis of the left inferior frontal gyrus. Conversely, articulatory features were not associated with brain activity in these regions. Overall, our results call for a degree of interdependence based on acoustic information, between the frontal and temporal ends of the language network.

RevDate: 2019-03-02

Klaus A, Lametti DR, Shiller DM, et al (2019)

Can perceptual training alter the effect of visual biofeedback in speech-motor learning?.

The Journal of the Acoustical Society of America, 145(2):805.

Recent work showing that a period of perceptual training can modulate the magnitude of speech-motor learning in a perturbed auditory feedback task could inform clinical interventions or second-language training strategies. The present study investigated the influence of perceptual training on a clinically and pedagogically relevant task of vocally matching a visually presented speech target using visual-acoustic biofeedback. Forty female adults aged 18-35 yr received perceptual training targeting the English /æ-ɛ/ contrast, randomly assigned to a condition that shifted the perceptual boundary toward either /æ/ or /ɛ/. Participants were then asked to produce the word head while modifying their output to match a visually presented acoustic target corresponding with a slightly higher first formant (F1, closer to /æ/). By analogy to findings from previous research, it was predicted that individuals whose boundary was shifted toward /æ/ would also show a greater magnitude of change in the visual biofeedback task. After perceptual training, the groups showed the predicted difference in perceptual boundary location, but they did not differ in their performance on the biofeedback matching task. It is proposed that the explicit versus implicit nature of the tasks used might account for the difference between this study and previous findings.

RevDate: 2019-03-02

Dissen Y, Goldberger J, J Keshet (2019)

Formant estimation and tracking: A deep learning approach.

The Journal of the Acoustical Society of America, 145(2):642.

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

RevDate: 2019-03-02

Kirkham S, Nance C, Littlewood B, et al (2019)

Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English.

The Journal of the Acoustical Society of America, 145(2):784.

This study analyses the time-varying acoustics of laterals and their adjacent vowels in Manchester and Liverpool English. Generalized additive mixed-models (GAMMs) are used for quantifying time-varying formant data, which allows the modelling of non-linearities in acoustic time series while simultaneously modelling speaker and word level variability in the data. These models are compared to single time-point analyses of lateral and vowel targets in order to determine what analysing formant dynamics can tell about dialect variation in speech acoustics. The results show that lateral targets exhibit robust differences between some positional contexts and also between dialects, with smaller differences present in vowel targets. The time-varying analysis shows that dialect differences frequently occur globally across the lateral and adjacent vowels. These results suggest a complex relationship between lateral and vowel targets and their coarticulatory dynamics, which problematizes straightforward claims about the realization of laterals and their adjacent vowels. These findings are further discussed in terms of hypotheses about positional and sociophonetic variation. In doing so, the utility of GAMMs for analysing time-varying multi-segmental acoustic signals is demonstrated, and the significance of the results for accounts of English lateral typology is highlighted.

RevDate: 2019-02-12

Menda G, Nitzany EI, Shamble PS, et al (2019)

The Long and Short of Hearing in the Mosquito Aedes aegypti.

Current biology : CB pii:S0960-9822(19)30028-4 [Epub ahead of print].

Mating behavior in Aedes aegypti mosquitoes occurs mid-air and involves the exchange of auditory signals at close range (millimeters to centimeters) [1-6]. It is widely assumed that this intimate signaling distance reflects short-range auditory sensitivity of their antennal hearing organs to faint flight tones [7, 8]. To the contrary, we show here that male mosquitoes can hear the female's flight tone at surprisingly long distances-from several meters to up to 10 m-and that unrestrained, resting Ae. aegypti males leap off their perches and take flight when they hear female flight tones. Moreover, auditory sensitivity tests of Ae. aegypti's hearing organ, made from neurophysiological recordings of the auditory nerve in response to pure-tone stimuli played from a loudspeaker, support the behavioral experiments. This demonstration of long-range hearing in mosquitoes overturns the common assumption that the thread-like antennal hearing organs of tiny insects are strictly close-range ears. The effective range of a hearing organ depends ultimately on its sensitivity [9-13]. Here, a mosquito's antennal ear is shown to be sensitive to sound levels down to 31 dB sound pressure level (SPL), translating to air particle velocity at nanometer dimensions. We note that the peak of energy of the first formant of the vowels of the human speech spectrum range from about 200-1,000 Hz and is typically spoken at 45-70 dB SPL; together, they lie in the sweet spot of mosquito hearing. VIDEO ABSTRACT.

RevDate: 2019-02-10

Garellek M (2019)

Acoustic Discriminability of the Complex Phonation System in !Xóõ.

Phonetica pii:000494301 [Epub ahead of print].

Phonation types, or contrastive voice qualities, are minimally produced using complex movements of the vocal folds, but may additionally involve constriction in the supraglottal and pharyngeal cavities. These complex articulations in turn produce a multidimensional acoustic output that can be modeled in various ways. In this study, I investigate whether the psychoacoustic model of voice by Kreiman et al. (2014) succeeds at distinguishing six phonation types of !Xóõ. Linear discriminant analysis is performed using parameters from the model averaged over the entire vowel as well as for the first and final halves of the vowel. The results indicate very high classification accuracy for all phonation types. Measures averaged over the vowel's entire duration are closely correlated with the discriminant functions, suggesting that they are sufficient for distinguishing even dynamic phonation types. Measures from all classes of parameters are correlated with the linear discriminant functions; in particular, the "strident" vowels, which are harsh in quality, are characterized by their noise, changes in spectral tilt, decrease in voicing amplitude and frequency, and raising of the first formant. Despite the large number of contrasts and the time-varying characteristics of many of the phonation types, the phonation contrasts in !Xóõ remain well differentiated acoustically.

RevDate: 2019-02-10

Apaydın E, İkincioğulları A, Çolak M, et al (2019)

The Voice Performance After Septoplasty With Surgical Efficacy Demonstrated Through Acoustic Rhinometry and Rhinomanometry.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30531-9 [Epub ahead of print].

OBJECTIVE: To demonstrate the surgical efficacy of septoplasty using acoustic rhinometry (AR) and anterior rhinomanometry (ARM) and to evaluate the effect of septoplasty on voice performance through subjective voice analysis methods.

MATERIALS AND METHODS: This prospective study enrolled a total of 62 patients who underwent septoplasty with the diagnosis of deviated nasal septum. Thirteen patients with no postoperative improvement versus preoperative period as shown by AR and/or ARM tests and three patients with postoperative complications and four patients who were lost to follow-up were excluded. As a result, a total of 42 patients were included in the study. Objective tests including AR, ARM, acoustic voice analysis and spectrographic analysis were performed before the surgery and at 1 month and 3 months after the surgery. Subjective measures included the Nasal Obstruction Symptom Evaluation questionnaire to evaluate surgical success and Voice Handicap Index-30 tool for assessment of voice performance postoperatively, both completed by all study patients.

RESULTS: Among acoustic voice analysis parameters, F0, jitter, Harmonics-to-Noise Ratio values as well as formant frequency (F1-F2-F3-F4) values did not show significant differences postoperatively in comparison to the preoperative period (P > 0.05). Only the shimmer value was statistically significantly reduced at 1 month (P < 0.05) and 3 months postoperatively (P < 0.05) versus baseline. Statistically significant reductions in Voice Handicap Index-30 scores were observed at postoperative 1 month (P < 0.001) and 3 months (P < 0.001) compared to the preoperative period and between postoperative 1 month and 3 months (P < 0.05).

CONCLUSION: In this study, first operative success of septoplasty was demonstrated through objective tests and then objective voice analyses were performed to better evaluate the overall effect of septoplasty on voice performance. Shimmer value was found to be improved in the early and late postoperative periods.

RevDate: 2019-02-05

de Souza GVS, Duarte JMT, de Andrade Trinas FV, et al (2019)

An Acoustic Examination of Pitch Variation in Soprano Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30416-8 [Epub ahead of print].

INTRODUCTION: The ability to perform acoustic inspection of data and to correlate the results with perceptual and physiological aspects facilitates vocal behavior analysis. The singing voice has specific characteristics and parameters that are involved during the phonation mechanism, which may be analyzed acoustically.

OBJECTIVE: To describe and analyze the fundamental frequency and formants in pitch variation in the /a/ vowel in sopranos.

METHODS: The sample consisted of 30 female participants between the ages of 20 to 45 years without vocal complaints. All sustained vowel sounds were recorded with the /a/ vowel sustained for 5 seconds, with three replications at low (C4-261 Hz), medium (Eb4-622 Hz), and high (Bb4-932 Hz) frequencies that were comfortable for the voice classification. In total, 90 samples were analyzed with digital extraction of the fundamental frequency (f0) and the first five formants (F1, F2, F3, F4, and F5) and manual confirmation. The middle segment was considered for analysis, whereas the onset and offset segments were not considered. Subsequently, FFT (fast Fourier transform) plots, LPC (linear predictive coding) graphs, and tube diagrams were created. The Shapiro-Wilks test was applied for adherence and the Friedman test was applied for comparison of paired samples.

RESULTS: For vocalizations at low and medium pitches, higher values were observed for the first five formant frequencies than for the f0 value. Overlaying the LPC and FFT graphs revealed a similarity between F1 and F2 at the two pitches, with clustered harmonics in the F3, F4, and F5 region in the low pitch. At the medium pitch, there was similarity between F3 and F4, an F5 peak, and tuned harmonics. However, in the high-pitch vocalizations, there was an increase in the F2, F3, F4, and F5 values in relation to f0, and there was similarity between them along with synchrony between f0 and F1, H2 and F2, H3 and F3, H4 and F4, and H5 and F5.

CONCLUSIONS: Pitch changes indicate differences in the behavior of the fundamental frequency and sound formants in sopranos. The comparison of the sustained vowels sounds in f0 at the three pitches revealed specific vocal tract changes on the LPC curve and FFT harmonics, with an extra gain range at 261 Hz, synchrony between peaks of formants and harmonics at 622 Hz, and equivalence of f0 and F1 at 932 Hz.

RevDate: 2019-01-16

Galle ME, Klein-Packard J, Schreiber K, et al (2019)

What Are You Waiting For? Real-Time Integration of Cues for Fricatives Suggests Encapsulated Auditory Memory.

Cognitive science, 43(1):.

Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150-350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.

RevDate: 2019-01-15

Naderifar E, Ghorbani A, Moradi N, et al (2019)

Use of formant centralization ratio for vowel impairment detection in normal hearing and different degrees of hearing impairment.

Logopedics, phoniatrics, vocology [Epub ahead of print].

PURPOSE: Hearing-impaired (HI) speakers show changes in vowel production and formant frequencies, as well as more cases of overlapping between vowels and more restricted formant space, than hearing speakers. This study was intended to explore whether the use of different acoustic parameters (Formant Centralization Ratio (FCR), Vowel Space Area (VSA), F2i/F2u ratio (second formant of/i,u/)) was suitable or not for characterizing impairments in the articulation of vowels in the speech of HL speakers. In fact, correlated acoustic parameters are used to determine the limits of tongue movements in vowel production in different severity degrees of hearing impairment.

METHODS: Speech recordings of 40 speakers with HL and 40 healthy controls were acoustically analyzed. The vowels (/a/,/i/,/u/) were extracted from the word context and, then, the first and second formants were calculated. The same vowel-formant elements were used to construct the FCR, expressed as (F2u + F2a + F1i + F1u)/(F2i + F1a), the F2i/F2u ratio, and the vowel space area (VSA), expressed as ABS((F1i*(F2a-F2u)+F1a*(F2u-F2i)+F1u*(F2i-F2a))/2).

RESULTS: The FCR differentiated HL groups from the control group and the discrimination was not gender-sensitive. All parameters were found to be strongly correlated with each other.

CONCLUSIONS: The findings of this study showed that FCR was a more sensitive acoustic parameter than F2i/F2u ratio and VSA to distinguish speech of the HL groups from that of the normal group. Thus, FCR is considered to be applicable as an early objective measure of impaired vowel articulation in HL speakers.

RevDate: 2019-01-09

Boelen C, Chabot JM, Diot P, et al (2017)

Challenges to health system, medical profession and accreditation of medical schools.

La Revue du praticien, 67(3):250-254.

RevDate: 2019-01-08

Ballard KJ, Halaki M, Sowman P, et al (2018)

An Investigation of Compensation and Adaptation to Auditory Perturbations in Individuals With Acquired Apraxia of Speech.

Frontiers in human neuroscience, 12:510.

Two auditory perturbation experiments were used to investigate the integrity of neural circuits responsible for speech sensorimotor adaptation in acquired apraxia of speech (AOS). This has implications for understanding the nature of AOS as well as normal speech motor control. Two experiments were conducted. In Experiment 1, compensatory responses to unpredictable fundamental frequency (F0) perturbations during vocalization were investigated in healthy older adults and adults with acquired AOS plus aphasia. F0 perturbation involved upward and downward 100-cent shifts versus no shift, in equal proportion, during 2 s vocalizations of the vowel /a/. In Experiment 2, adaptive responses to sustained first formant (F1) perturbations during speech were investigated in healthy older adults, adults with AOS and adults with aphasia only (APH). The F1 protocol involved production of the vowel /ε/ in four consonant-vowel words of Australian English (pear, bear, care, dare), and one control word with a different vowel (paw). An unperturbed Baseline phase was followed by a gradual Ramp to a 30% upward F1 shift stimulating a compensatory response, a Hold phase where the perturbation was repeatedly presented with alternating blocks of masking trials to probe adaptation, and an End phase with masking trials only to measure persistence of any adaptation. AOS participants showed normal compensation to unexpected F0 perturbations, indicating that auditory feedback control of low-level, non-segmental parameters is intact. Furthermore, individuals with AOS displayed an adaptive response to sustained F1 perturbations, but age-matched controls and APH participants did not. These findings suggest that older healthy adults may have less plastic motor programs that resist modification based on sensory feedback, whereas individuals with AOS have less well-established and more malleable motor programs due to damage from stroke.

RevDate: 2019-01-02

Caldwell MT, Jiradejvong P, CJ Limb (2018)

Effects of Phantom Electrode Stimulation on Vocal Production in Cochlear Implant Users.

Ear and hearing [Epub ahead of print].

OBJECTIVES: Cochlear implant (CI) users suffer from a range of speech impairments, such as stuttering and vocal control of pitch and intensity. Though little research has focused on the role of auditory feedback in the speech of CI users, these speech impairments could be due in part to limited access to low-frequency cues inherent in CI-mediated listening. Phantom electrode stimulation (PES) represents a novel application of current steering that extends access to low frequencies for CI recipients. It is important to note that PES transmits frequencies below 300 Hz, whereas Baseline does not. The objective of this study was to explore the effects of PES on multiple frequency-related characteristics of voice production.

DESIGN: Eight postlingually deafened, adult Advanced Bionics CI users underwent a series of vocal production tests including Tone Repetition, Vowel Sound Production, Passage Reading, and Picture Description. Participants completed all of these tests twice: once with PES and once using their program used for everyday listening (Baseline). An additional test, Automatic Modulation, was included to measure acute effects of PES and was completed only once. This test involved switching between PES and Baseline at specific time intervals in real time as participants read a series of short sentences. Finally, a subjective Vocal Effort measurement was also included.

RESULTS: In Tone Repetition, the fundamental frequencies (F0) of tones produced using PES and the size of musical intervals produced using PES were significantly more accurate (closer to the target) compared with Baseline in specific gender, target tone range, and target tone type testing conditions. In the Vowel Sound Production task, vowel formant profiles produced using PES were closer to that of the general population compared with those produced using Baseline. The Passage Reading and Picture Description task results suggest that PES reduces measures of pitch variability (F0 standard deviation and range) in natural speech production. No significant results were found in comparisons of PES and Baseline in the Automatic Modulation task nor in the Vocal Effort task.

CONCLUSIONS: The findings of this study suggest that usage of PES increases accuracy of pitch matching in repeated sung tones and frequency intervals, possibly due to more accurate F0 representation. The results also suggest that PES partially normalizes the vowel formant profiles of select vowel sounds. PES seems to decrease pitch variability of natural speech and appears to have limited acute effects on natural speech production, though this finding may be due in part to paradigm limitations. On average, subjective ratings of vocal effort were unaffected by the usage of PES versus Baseline.

RevDate: 2019-01-02

Saba JN, Ali H, JHL Hansen (2018)

Formant priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

The Journal of the Acoustical Society of America, 144(6):3371.

The Advanced Combination Encoder (ACE) signal processing strategy is used in the majority of cochlear implant (CI) sound processors manufactured by Cochlear Corporation. This "n-of-m" strategy selects "n" out of "m" available frequency channels with the highest spectral energy in each stimulation cycle. It is hypothesized that at low signal-to-noise ratio (SNR) conditions, noise-dominant frequency channels are susceptible for selection, neglecting channels containing target speech cues. In order to improve speech segregation in noise, explicit encoding of formant frequency locations within the standard channel selection framework of ACE is suggested. Two strategies using the direct formant estimation algorithms are developed within this study, FACE (formant-ACE) and VFACE (voiced-activated-formant-ACE). Speech intelligibility from eight CI users is compared across 11 acoustic conditions, including mixtures of noise and reverberation at multiple SNRs. Significant intelligibility gains were observed with VFACE over ACE in 5 dB babble noise; however, results with FACE/VFACE in all other conditions were comparable to standard ACE. An increased selection of channels associated with the second formant frequency is observed for FACE and VFACE. Both proposed methods may serve as potential supplementary channel selection techniques for the ACE sound processing strategy for cochlear implants.

RevDate: 2019-01-02

Kochetov A, Tabain M, Sreedevi N, et al (2018)

Manner and place differences in Kannada coronal consonants: Articulatory and acoustic results.

The Journal of the Acoustical Society of America, 144(6):3221.

This study investigated articulatory differences in the realization of Kannada coronal consonants of the same place but different manner of articulation. This was done by examining tongue positions and acoustic formant transitions for dentals and retroflexes of three manners of articulation: stops, nasals, and laterals. Ultrasound imaging data collected from ten speakers of the language revealed that the tongue body/root was more forward for the nasal manner of articulation compared to stop and lateral consonants of the same place of articulation. The dental nasal and lateral were also produced with a higher front part of the tongue compared to the dental stop. As a result, the place contrast was greater in magnitude for the stops (being the prototypical dental vs retroflex) than for the nasals and laterals (being apparently alveolar vs retroflex). Acoustic formant transition differences were found to reflect some of the articulatory differences, while also providing evidence for the more dynamic articulation of nasal and lateral retroflexes. Overall, the results of the study shed light on factors underlying manner requirements (aerodynamic or physiological) and how the factors interact with principles of gestural economy/symmetry, providing an empirical baseline for further cross-language investigations and articulation-to-acoustics modeling.

RevDate: 2018-12-31

Mekyska J, Galaz Z, Kiska T, et al (2018)

Quantitative Analysis of Relationship Between Hypokinetic Dysarthria and the Freezing of Gait in Parkinson's Disease.

Cognitive computation, 10(6):1006-1018.

Hypokinetic dysarthria (HD) and freezing of gait (FOG) are both axial symptoms that occur in patients with Parkinson's disease (PD). It is assumed they have some common pathophysiological mechanisms and therefore that speech disorders in PD can predict FOG deficits within the horizon of some years. The aim of this study is to employ a complex quantitative analysis of the phonation, articulation and prosody in PD patients in order to identify the relationship between HD and FOG, and establish a mathematical model that would predict FOG deficits using acoustic analysis at baseline. We enrolled 75 PD patients who were assessed by 6 clinical scales including the Freezing of Gait Questionnaire (FOG-Q). We subsequently extracted 19 acoustic measures quantifying speech disorders in the fields of phonation, articulation and prosody. To identify the relationship between HD and FOG, we performed a partial correlation analysis. Finally, based on the selected acoustic measures, we trained regression models to predict the change in FOG during a 2-year follow-up. We identified significant correlations between FOG-Q scores and the acoustic measures based on formant frequencies (quantifying the movement of the tongue and jaw) and speech rate. Using the regression models, we were able to predict a change in particular FOG-Q scores with an error of between 7.4 and 17.0 %. This study is suggesting that FOG in patients with PD is mainly linked to improper articulation, a disturbed speech rate and to intelligibility. We have also proved that the acoustic analysis of HD at the baseline can be used as a predictor of the FOG deficit during 2 years of follow-up. This knowledge enables researchers to introduce new cognitive systems that predict gait difficulties in PD patients.

RevDate: 2018-12-20

Masapollo M, Zhao TC, Franklin L, et al (2018)

Asymmetric discrimination of nonspeech tonal analogues of vowels.

Journal of experimental psychology. Human perception and performance pii:2018-64940-001 [Epub ahead of print].

Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences because of the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with nonspeech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with 2-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in 1 or both of these 2 acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in nonspeech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

RevDate: 2018-12-19

Carney LH, JM McDonough (2018)

Nonlinear auditory models yield new insights into representations of vowels.

Attention, perception & psychophysics pii:10.3758/s13414-018-01644-w [Epub ahead of print].

Studies of vowel systems regularly appeal to the need to understand how the auditory system encodes and processes the information in the acoustic signal. The goal of this study is to present computational models to address this need, and to use the models to illustrate responses to vowels at two levels of the auditory pathway. Many of the models previously used to study auditory representations of speech are based on linear filter banks simulating the tuning of the inner ear. These models do not incorporate key nonlinear response properties of the inner ear that influence responses at conversational-speech sound levels. These nonlinear properties shape neural representations in ways that are important for understanding responses in the central nervous system. The model for auditory-nerve (AN) fibers used here incorporates realistic nonlinear properties associated with the basilar membrane, inner hair cells (IHCs), and the IHC-AN synapse. These nonlinearities set up profiles of f0-related fluctuations that vary in amplitude across the population of frequency-tuned AN fibers. Amplitude fluctuations in AN responses are smallest near formant peaks and largest at frequencies between formants. These f0-related fluctuations strongly excite or suppress neurons in the auditory midbrain, the first level of the auditory pathway where tuning for low-frequency fluctuations in sounds occurs. Formant-related amplitude fluctuations provide representations of the vowel spectrum in discharge rates of midbrain neurons. These representations in the midbrain are robust across a wide range of sound levels, including the entire range of conversational-speech levels, and in the presence of realistic background noise levels.

RevDate: 2018-12-14

Anikin A, N Johansson (2018)

Implicit associations between individual properties of color and sound.

Attention, perception & psychophysics pii:10.3758/s13414-018-01639-7 [Epub ahead of print].

We report a series of 22 experiments in which the implicit associations test (IAT) was used to investigate cross-modal correspondences between visual (luminance, hue [R-G, B-Y], saturation) and acoustic (loudness, pitch, formants [F1, F2], spectral centroid, trill) dimensions. Colors were sampled from the perceptually accurate CIE-Lab space, and the complex, vowel-like sounds were created with a formant synthesizer capable of separately manipulating individual acoustic properties. In line with previous reports, the loudness and pitch of acoustic stimuli were associated with both luminance and saturation of the presented colors. However, pitch was associated specifically with color lightness, whereas loudness mapped onto greater visual saliency. Manipulating the spectrum of sounds without modifying their pitch showed that an upward shift of spectral energy was associated with the same visual features (higher luminance and saturation) as higher pitch. In contrast, changing formant frequencies of synthetic vowels while minimizing the accompanying shifts in spectral centroid failed to reveal cross-modal correspondences with color. This may indicate that the commonly reported associations between vowels and colors are mediated by differences in the overall balance of low- and high-frequency energy in the spectrum rather than by vowel identity as such. Surprisingly, the hue of colors with the same luminance and saturation was not associated with any of the tested acoustic features, except for a weak preference to match higher pitch with blue (vs. yellow). We discuss these findings in the context of previous research and consider their implications for sound symbolism in world languages.

RevDate: 2018-12-12

Paltura C, K Yelken (2018)

An Examination of Vocal Tract Acoustics following Wendler's Glottoplasty.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 71(1):24-28 pii:000494970 [Epub ahead of print].

PURPOSE: To investigate the formant frequency (FF) features of transgender females' (TFs) voice after Wendler's glottoplasty surgery and compare these levels with age-matched healthy males and females.

STUDY DESIGN: Controlled prospective.

METHODS: 20 TFs and 20 genetically male and female age-matched healthy controls were enrolled in the study. The fundamental frequency (F0) and FFs F1-F4 were obtained from TF speakers 6 months after surgery. These levels were compared with those of healthy controls.

RESULTS: Statistical analysis showed that the median F0 values were similar between TFs and females. The median F1 levels of TFs were different from females but similar to males. The F2 levels of TFs were similar to females but different from males. The F3 and F4 levels were significantly different from both male and female controls.

CONCLUSION: Wendler's glottoplasty technique is an effective method to increase F0 levels among TF patients; however, these individuals report their voice does not sufficiently project femininity. The results obtained with regard to FF levels may be the reason for this problem. Voice therapy is recommended as a possible approach to assist TF patients achieve a satisfactory feminine voice.

RevDate: 2018-12-03

Hardy TLD, Rieger JM, Wells K, et al (2018)

Acoustic Predictors of Gender Attribution, Masculinity-Femininity, and Vocal Naturalness Ratings Amongst Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30355-2 [Epub ahead of print].

PURPOSE: This study aimed to identify the most salient set of acoustic predictors of (1) gender attribution; (2) perceived masculinity-femininity; and (3) perceived vocal naturalness amongst a group of transgender and cisgender speakers to inform voice and communication feminization training programs. This study used a unique set of acoustic variables and included a third, androgynous, choice for gender attribution ratings.

METHOD: Data were collected across two phases and involved two separate groups of participants: communicators and raters. In the first phase, audio recordings were captured of communicators (n = 40) during cartoon retell, sustained vowel, and carrier phrase tasks. Acoustic measures were obtained from these recordings. In the second phase, raters (n = 20) provided ratings of gender attribution, perceived masculinity-femininity, and vocal naturalness based on a sample of the cartoon description recording.

RESULTS: Results of a multinomial logistic regression analysis identified mean fundamental frequency (fo) as the sole acoustic measure that changed the odds of being attributed as a woman or ambiguous in gender rather than as a man. Multiple linear regression analyses identified mean fo, average formant frequency of /i/, and mean sound pressure level as predictors of masculinity-femininity ratings and mean fo, average formant frequency, and rate of speech as predictors of vocal naturalness ratings.

CONCLUSION: The results of this study support the continued targeting of fo and vocal tract resonance in voice and communication feminization/masculinization training programs and provide preliminary evidence for more emphasis being placed on vocal intensity and rate of speech. Modification of these voice parameters may help clients to achieve a natural-sounding voice that satisfactorily represents their affirmed gender.

RevDate: 2018-11-28

Fujimura S, Kojima T, Okanoue Y, et al (2018)

Discrimination of "hot potato voice" caused by upper airway obstruction utilizing a support vector machine.

The Laryngoscope [Epub ahead of print].

OBJECTIVES/HYPOTHESIS: "Hot potato voice" (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM).

STUDY DESIGN: Quasi-experimental research design.

METHODS: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed.

RESULTS: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%.

CONCLUSIONS: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy.

LEVEL OF EVIDENCE: 2c Laryngoscope, 2018.

RevDate: 2018-11-26
CmpDate: 2018-11-26

Chen Q, Liu J, Yang HM, et al (2018)

Research on tunable distributed SPR sensor based on bimetal film.

Applied optics, 57(26):7591-7599.

In order to overcome the limitations in range of traditional prism structure surface plasmon resonance (SPR) single-point sensor measurement, a symmetric bimetallic film SPR multi-sensor structure is proposed. Based on this, the dual-channel sensing attenuation mechanism of SPR in gold and silver composite film and the improvement of sensing characteristics were studied. By optimizing the characteristics such as material and thickness, a wider range of dual-channel distributed sensing is realized. Using a He-Ne laser (632.8 nm) as the reference light source, prism-excited symmetric SPR sensing was studied theoretically for a symmetrical metal-clad dielectric waveguide using thin-film optics theory. The influence of the angle of incidence of the light source and the thickness of the dielectric layer on the performance of SPR dual formant sensing is explained. The finite-difference time-domain method was used for the simulation calculation for various thicknesses and compositions of the symmetric combined layer, resulting in the choice of silver (30 nm) and gold (10 nm). When the incident angle was 78 deg, the quality factor reached 5960, showing an excellent resonance sensing effect. The sensitivity reached a maximum of 5.25×10-5 RIU when testing the water content of an aqueous solution of honey, which proves the feasibility and practicality of the structure design. The structure improves the theoretical basis for designing an SPR multi-channel distributed sensing system, which can greatly reduce the cost of biochemical detection and significantly increase the detection efficiency.

RevDate: 2018-11-18

Graf S, Schwiebacher J, Richter L, et al (2018)

Adjustment of Vocal Tract Shape via Biofeedback: Influence on Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30326-6 [Epub ahead of print].

The study assessed 30 nonprofessional singers to evaluate the effects of vocal tract shape adjustment via increased resonance toward an externally applied sinusoidal frequency of 900 Hz without phonation. The amplification of the sound wave was used as biofeedback signal and the intensity and the formant position of the basic vowels /a/, /e/, /i/, /o/, and /u/ were compared before and after a vocal tract adjustment period. After the adjustment period, the intensities for all vowels increased and the measured changes correlated with the participants' self-perception.The diferences between the second formant position of the vowels and the applied frequency influences the changes in amplitude and in formant frequencies. The most significant changes in formant frequency occurred with vowels that did not include a formant frequency of 900 Hz, while the increase in amplitude was the strongest for vowels with a formant frequency of about 900 Hz.

RevDate: 2018-11-16

Bhat GS, Reddy CKA, Shankar N, et al (2018)

Smartphone based real-time super Gaussian single microphone Speech Enhancement to improve intelligibility for hearing aid users using formant information.

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2018:5503-5506.

In this paper, we present a Speech Enhancement (SE) technique to improve intelligibility of speech perceived by Hearing Aid users using smartphone as an assistive device. We use the formant frequency information to improve the overall quality and intelligibility of the speech. The proposed SE method is based on new super Gaussian joint maximum a Posteriori (SGJMAP) estimator. Using the priori information of formant frequency locations, the derived gain function has " tradeoff" factors that allows the smartphone user to customize perceptual preference, by controlling the amount of noise suppression and speech distortion in real-time. The formant frequency information helps the hearing aid user to control the gains over the non-formant frequency band, allowing the HA users to attain more noise suppression while maintaining the speech intelligibility using a smartphone application. Objective intelligibility measures and subjective results reflect the usability of the developed SE application in noisy real world acoustic environment.

RevDate: 2018-11-14

Williams D, Escudero P, A Gafos (2018)

Spectral change and duration as cues in Australian English listeners' front vowel categorization.

The Journal of the Acoustical Society of America, 144(3):EL215.

Australian English /iː/, /ɪ/, and /ɪə/ exhibit almost identical average first (F1) and second (F2) formant frequencies and differ in duration and vowel inherent spectral change (VISC). The cues of duration, F1 × F2 trajectory direction (TD) and trajectory length (TL) were assessed in listeners' categorization of /iː/ and /ɪə/ compared to /ɪ/. Duration was important for distinguishing both /iː/ and /ɪə/ from /ɪ/. TD and TL were important for categorizing /iː/ versus /ɪ/, whereas only TL was important for /ɪə/ versus /ɪ/. Finally, listeners' use of duration and VISC was not mutually affected for either vowel compared to /ɪ/.

RevDate: 2018-11-09

Gómez-Vilda P, Gómez-Rodellar A, Vicente JMF, et al (2018)

Neuromechanical Modelling of Articulatory Movements from Surface Electromyography and Speech Formants.

International journal of neural systems [Epub ahead of print].

Speech articulation is produced by the movements of muscles in the larynx, pharynx, mouth and face. Therefore speech shows acoustic features as formants which are directly related with neuromotor actions of these muscles. The first two formants are strongly related with jaw and tongue muscular activity. Speech can be used as a simple and ubiquitous signal, easy to record and process, either locally or on e-Health platforms. This fact may open a wide set of applications in the study of functional grading and monitoring neurodegenerative diseases. A relevant question, in this sense, is how far speech correlates and neuromotor actions are related. This preliminary study is intended to find answers to this question by using surface electromyographic recordings on the masseter and the acoustic kinematics related with the first formant. It is shown in the study that relevant correlations can be found among the surface electromyographic activity (dynamic muscle behavior) and the positions and first derivatives of the first formant (kinematic variables related to vertical velocity and acceleration of the joint jaw and tongue biomechanical system). As an application example, it is shown that the probability density function associated to these kinematic variables is more sensitive than classical features as Vowel Space Area (VSA) or Formant Centralization Ratio (FCR) in characterizing neuromotor degeneration in Parkinson's Disease.

RevDate: 2018-10-26

Lopes LW, Alves JDN, Evangelista DDS, et al (2018)

Accuracy of traditional and formant acoustic measurements in the evaluation of vocal quality.

CoDAS, 30(5):e20170282 pii:S2317-17822018000500310.

PURPOSE: Investigate the accuracy of isolated and combined acoustic measurements in the discrimination of voice deviation intensity (GD) and predominant voice quality (PVQ) in patients with dysphonia.

METHODS: A total of 302 female patients with voice complaints participated in the study. The sustained /ɛ/ vowel was used to extract the following acoustic measures: mean and standard deviation (SD) of fundamental frequency (F0), jitter, shimmer, glottal to noise excitation (GNE) ratio and the mean of the first three formants (F1, F2, and F3). Auditory-perceptual evaluation of GD and PVQ was conducted by three speech-language pathologists who were voice specialists.

RESULTS: In isolation, only GNE provided satisfactory performance when discriminating between GD and PVQ. Improvement in the classification of GD and PVQ was observed when the acoustic measures were combined. Mean F0, F2, and GNE (healthy × mild-to-moderate deviation), the SDs of F0, F1, and F3 (mild-to-moderate × moderate deviation), and mean jitter and GNE (moderate × intense deviation) were the best combinations for discriminating GD. The best combinations for discriminating PVQ were mean F0, shimmer, and GNE (healthy × rough), F3 and GNE (healthy × breathy), mean F 0, F3, and GNE (rough × tense), and mean F0 , F1, and GNE (breathy × tense).

CONCLUSION: In isolation, GNE proved to be the only acoustic parameter capable of discriminating between GG and PVQ. There was a gain in classification performance for discrimination of both GD and PVQ when traditional and formant acoustic measurements were combined.

RevDate: 2018-10-23

Grawunder S, Crockford C, Clay Z, et al (2018)

Higher fundamental frequency in bonobos is explained by larynx morphology.

Current biology : CB, 28(20):R1188-R1189.

Acoustic signals, shaped by natural and sexual selection, reveal ecological and social selection pressures [1]. Examining acoustic signals together with morphology can be particularly revealing. But this approach has rarely been applied to primates, where clues to the evolutionary trajectory of human communication may be found. Across vertebrate species, there is a close relationship between body size and acoustic parameters, such as formant dispersion and fundamental frequency (f0). Deviations from this acoustic allometry usually produce calls with a lower f0 than expected for a given body size, often due to morphological adaptations in the larynx or vocal tract [2]. An unusual example of an obvious mismatch between fundamental frequency and body size is found in the two closest living relatives of humans, bonobos (Pan paniscus) and chimpanzees (Pan troglodytes). Although these two ape species overlap in body size [3], bonobo calls have a strikingly higher f0 than corresponding calls from chimpanzees [4]. Here, we compare acoustic structures of calls from bonobos and chimpanzees in relation to their larynx morphology. We found that shorter vocal fold length in bonobos compared to chimpanzees accounted for species differences in f0, showing a rare case of positive selection for signal diminution in both bonobo sexes.

RevDate: 2018-10-27

Niziolek CA, S Kiran (2018)

Assessing speech correction abilities with acoustic analyses: Evidence of preserved online correction in persons with aphasia.

International journal of speech-language pathology [Epub ahead of print].

PURPOSE: Disorders of speech production may be accompanied by abnormal processing of speech sensory feedback. Here, we introduce a semi-automated analysis designed to assess the degree to which speakers use natural online feedback to decrease acoustic variability in spoken words. Because production deficits in aphasia have been hypothesised to stem from problems with sensorimotor integration, we investigated whether persons with aphasia (PWA) can correct their speech acoustics online.

METHOD: Eight PWA in the chronic stage produced 200 repetitions each of three monosyllabic words. Formant variability was measured for each vowel in multiple time windows within the syllable, and the reduction in formant variability from vowel onset to midpoint was quantified.

RESULT: PWA significantly decreased acoustic variability over the course of the syllable, providing evidence of online feedback correction mechanisms. The magnitude of this corrective formant movement exceeded past measurements in control participants.

CONCLUSION: Vowel centreing behaviour suggests that error correction abilities are at least partially spared in speakers with aphasia, and may be relied upon to compensate for feedforward deficits by bringing utterances back on track. These proof of concept data show the potential of this analysis technique to elucidate the mechanisms underlying disorders of speech production.

RevDate: 2018-10-21

Fazeli M, Moradi N, Soltani M, et al (2018)

Dysphonia Characteristics and Vowel Impairment in Relation to Neurological Status in Patients with Multiple Sclerosis.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30351-5 [Epub ahead of print].

PURPOSE: In this study, we attempted to assess the phonation and articulation subsystem changes in patients with multiple sclerosis compared to healthy individuals using Dysphonia Severity Index and Formant Centralization Ratio with the aim of evaluating the correlation between these two indexes with neurological status.

MATERIALS AND METHODS: A sample of 47 patients with multiple sclerosis and 20 healthy speakers were evaluated. Patients' disease duration and disability were monitored by a neurologist. Dysphonia Severity Index and Formant Centralization Ratio scores were computed for each individual. Acoustic analysis was performed by Praat software; the statistical analysis was run using SPSS 21. To compare multiple sclerosis patients with the control group, Mann-Whitney U test was used for non-normal data and independent-samples t test for normal data. Also a logistic regression was used to compare the data. Correlation between acoustic characteristics and neurological status was verified using Spearman correlation coefficient and linear regression was performed to evaluate the simultaneous effects of neurological data.

RESULTS: Statistical analysis revealed that a significant difference existed between multiple sclerosis and healthy participants. Formant Centralization Ratio had a significant correlation with disease severity.

CONCLUSION: Multiple sclerosis patients would be differentiated from healthy individuals by their phonation and articulatory features. Scores of these two indexes can be considered as appropriate criteria for onset of the speech problems in multiple sclerosis. Also, articulation subsystem changes might be useful signs for the progression of the disease.

RevDate: 2018-10-19

Brabenec L, Klobusiakova P, Barton M, et al (2018)

Non-invasive stimulation of the auditory feedback area for improved articulation in Parkinson's disease.

Parkinsonism & related disorders pii:S1353-8020(18)30439-5 [Epub ahead of print].

INTRODUCTION: Hypokinetic dysarthria (HD) is a common symptom of Parkinson's disease (PD) which does not respond well to PD treatments. We investigated acute effects of repetitive transcranial magnetic stimulation (rTMS) of the motor and auditory feedback area on HD in PD using acoustic analysis of speech.

METHODS: We used 10 Hz and 1 Hz stimulation protocols and applied rTMS over the left orofacial primary motor area, the right superior temporal gyrus (STG), and over the vertex (a control stimulation site) in 16 PD patients with HD. A cross-over design was used. Stimulation sites and protocols were randomised across subjects and sessions. Acoustic analysis of a sentence reading task performed inside the MR scanner was used to evaluate rTMS-induced effects on motor speech. Acute fMRI changes due to rTMS were also analysed.

RESULTS: The 1 Hz STG stimulation produced significant increases of the relative standard deviation of the 2nd formant (p = 0.019), i.e. an acoustic parameter describing the tongue and jaw movements. The effects were superior to the control site stimulation and were accompanied by increased resting state functional connectivity between the stimulated region and the right parahippocampal gyrus. The rTMS-induced acoustic changes were correlated with the reading task-related BOLD signal increases of the stimulated area (R = 0.654, p = 0.029).

CONCLUSION: Our results demonstrate for the first time that low-frequency stimulation of the temporal auditory feedback area may improve articulation in PD and enhance functional connectivity between the STG and the cortical region involved in an overt speech control.

RevDate: 2018-10-19

Gómez-Vilda P, Galaz Z, Mekyska J, et al (2018)

Vowel Articulation Dynamic Stability Related to Parkinson's Disease Rating Features: Male Dataset.

International journal of neural systems [Epub ahead of print].

Neurodegenerative pathologies as Parkinson's Disease (PD) show important distortions in speech, affecting fluency, prosody, articulation and phonation. Classically, measurements based on articulation gestures altering formant positions, as the Vocal Space Area (VSA) or the Formant Centralization Ratio (FCR) have been proposed to measure speech distortion, but these markers are based mainly on static positions of sustained vowels. The present study introduces a measurement based on the mutual information distance among probability density functions of kinematic correlates derived from formant dynamics. An absolute kinematic velocity associated to the position of the jaw and tongue articulation gestures is estimated and modeled statistically. The distribution of this feature may differentiate PD patients from normative speakers during sustained vowel emission. The study is based on a limited database of 53 male PD patients, contrasted to a very selected and stable set of eight normative speakers. In this sense, distances based on Kullback-Leibler divergence seem to be sensitive to PD articulation instability. Correlation studies show statistically relevant relationship between information contents based on articulation instability to certain motor and nonmotor clinical scores, such as freezing of gait, or sleep disorders. Remarkably, one of the statistically relevant correlations point out to the time interval passed since the first diagnostic. These results stress the need of defining scoring scales specifically designed for speech disability estimation and monitoring methodologies in degenerative diseases of neuromotor origin.

RevDate: 2018-11-14

den Ouden DB, Galkina E, Basilakos A, et al (2018)

Vowel Formant Dispersion Reflects Severity of Apraxia of Speech.

Aphasiology, 32(8):902-921.

Background: Apraxia of Speech (AOS) has been associated with deviations in consonantal voice-onset-time (VOT), but studies of vowel acoustics have yielded conflicting results. However, a speech motor planning disorder that is not bound by phonological categories is expected to affect vowel as well as consonant articulations.

Aims: We measured consonant VOTs and vowel formants produced by a large sample of stroke survivors, and assessed to what extent these variables and their dispersion are predictive of AOS presence and severity, based on a scale that uses clinical observations to rate gradient presence of AOS, aphasia, and dysarthria.

Methods & Procedures: Picture-description samples were collected from 53 stroke survivors, including unimpaired speakers (12) and speakers with primarily aphasia (19), aphasia with AOS (12), primarily AOS (2), aphasia with dysarthria (2), and aphasia with AOS and dysarthria (6). The first three formants were extracted from vowel tokens bearing main stress in open-class words, as well as VOTs for voiced and voiceless stops. Vowel space was estimated as reflected in the formant centralization ratio. Stepwise Linear Discriminant Analyses were used to predict group membership, and ordinal regression to predict AOS severity, based on the absolute values of these variables, as well as the standard deviations of formants and VOTs within speakers.

Outcomes and Results: Presence and severity of AOS were most consistently predicted by the dispersion of F1, F2, and voiced-stop VOT. These phonetic-acoustic measures do not correlate with aphasia severity.

Conclusions: These results confirm that the AOS affects articulation across-the-board and does not selectively spare vowel production.

RevDate: 2018-11-14

Baotic A, Garcia M, Boeckle M, et al (2018)

Field Propagation Experiments of Male African Savanna Elephant Rumbles: A Focus on the Transmission of Formant Frequencies.

Animals : an open access journal from MDPI, 8(10): pii:ani8100167.

African savanna elephants live in dynamic fission⁻fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the 'rumble', with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral⁻temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant's natural habitat.

RevDate: 2018-10-01

Pabon P, S Ternström (2018)

Feature Maps of the Acoustic Spectrum of the Voice.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30185-1 [Epub ahead of print].

The change in the spectrum of sustained /a/ vowels was mapped over the voice range from low to high fundamental frequency and low to high sound pressure level (SPL), in the form of the so-called voice range profile (VRP). In each interval of one semitone and one decibel, narrowband spectra were averaged both within and across subjects. The subjects were groups of 7 male and 12 female singing students, as well as a group of 16 untrained female voices. For each individual and also for each group, pairs of VRP recordings were made, with stringent separation of the modal/chest and falsetto/head registers. Maps are presented of eight scalar metrics, each of which was chosen to quantify a particular feature of the voice spectrum, over fundamental frequency and SPL. Metrics 1 and 2 chart the role of the fundamental in relation to the rest of the spectrum. Metrics 3 and 4 are used to explore the role of resonances in relation to SPL. Metrics 5 and 6 address the distribution of high frequency energy, while metrics 7 and 8 seek to describe the distribution of energy at the low end of the voice spectrum. Several examples are observed of phenomena that are difficult to predict from linear source-filter theory, and of the voice source being less uniform over the voice range than is conventionally assumed. These include a high-frequency band-limiting at high SPL and an unexpected persistence of the second harmonic at low SPL. The two voice registers give rise to clearly different maps. Only a few effects of training were observed, in the low frequency end below 2kHz. The results are of potential interest in voice analysis, voice synthesis and for new insights into the voice production mechanism.

RevDate: 2018-09-30

Kraus MS, Walker TM, Jarskog LF, et al (2018)

Basic auditory processing deficits and their association with auditory emotion recognition in schizophrenia.

Schizophrenia research pii:S0920-9964(18)30542-5 [Epub ahead of print].

BACKGROUND: Individuals with schizophrenia are impaired in their ability to recognize emotions based on vocal cues and these impairments are associated with poor global outcome. Basic perceptual processes, such as auditory pitch processing, are impaired in schizophrenia and contribute to difficulty identifying emotions. However, previous work has focused on a relatively narrow assessment of auditory deficits and their relation to emotion recognition impairment in schizophrenia.

METHODS: We have assessed 87 patients with schizophrenia and 73 healthy controls on a comprehensive battery of tasks spanning the five empirically derived domains of auditory function. We also explored the relationship between basic auditory processing and auditory emotion recognition within the patient group using correlational analysis.

RESULTS: Patients exhibited widespread auditory impairments across multiple domains of auditory function, with mostly medium effect sizes. Performance on all of the basic auditory tests correlated with auditory emotion recognition at the p < .01 level in the patient group, with 9 out of 13 tests correlating with emotion recognition at r = 0.40 or greater. After controlling for cognition, many of the largest correlations involved spectral processing within the phase-locking range and discrimination of vocally based stimuli.

CONCLUSIONS: While many auditory skills contribute to this impairment, deficient formant discrimination appears to be a key skill contributing to impaired emotion recognition as this was the only basic auditory skill to enter a step-wise multiple regression after first entering a measure of cognitive impairment, and formant discrimination accounted for significant unique variance in emotion recognition performance after accounting for deficits in pitch processing.

RevDate: 2018-09-25

Han C, Wang H, Fasolt V, et al (2018)

No clear evidence for correlations between handgrip strength and sexually dimorphic acoustic properties of voices.

American journal of human biology : the official journal of the Human Biology Council [Epub ahead of print].

OBJECTIVES: Recent research on the signal value of masculine physical characteristics in men has focused on the possibility that such characteristics are valid cues of physical strength. However, evidence that sexually dimorphic vocal characteristics are correlated with physical strength is equivocal. Consequently, we undertook a further test for possible relationships between physical strength and masculine vocal characteristics.

METHODS: We tested the putative relationships between White UK (N = 115) and Chinese (N = 106) participants' handgrip strength (a widely used proxy for general upper-body strength) and five sexually dimorphic acoustic properties of voices: fundamental frequency (F0), fundamental frequency's SD (F0-SD), formant dispersion (Df), formant position (Pf), and estimated vocal-tract length (VTL).

RESULTS: Analyses revealed no clear evidence that stronger individuals had more masculine voices.

CONCLUSIONS: Our results do not support the hypothesis that masculine vocal characteristics are a valid cue of physical strength.

RevDate: 2018-11-13

Easwar V, Banyard A, Aiken SJ, et al (2018)

Phase-locked responses to the vowel envelope vary in scalp-recorded amplitude due to across-frequency response interactions.

The European journal of neuroscience, 48(10):3126-3145.

Neural encoding of the envelope of sounds like vowels is essential to access temporal information useful for speech recognition. Subcortical responses to envelope periodicity of vowels can be assessed using scalp-recorded envelope following responses (EFRs); however, the amplitude of EFRs vary by vowel spectra and the causal relationship is not well understood. One cause for spectral dependency could be interactions between responses with different phases, initiated by multiple stimulus frequencies. Phase differences can arise from earlier initiation of processing high frequencies relative to low frequencies in the cochlea. This study investigated the presence of such phase interactions by measuring EFRs to two naturally spoken vowels (/ε/ and /u/), while delaying the envelope phase of the second formant band (F2+) relative to the first formant (F1) band in 45° increments. At 0° F2+ phase delay, EFRs elicited by the vowel /ε/ were lower in amplitude than the EFRs elicited by /u/. Using vector computations, we found that the lower amplitude of /ε/-EFRs was caused by linear superposition of F1- and F2+-contributions with larger F1-F2+ phase differences (166°) compared to /u/ (19°). While the variation in amplitude across F2+ phase delays could be modeled with two dominant EFR sources for both vowels, the degree of variation was dependent on F1 and F2+ EFR characteristics. Together, we demonstrate that (a) broadband sounds like vowels elicit independent responses from different stimulus frequencies that may be out-of-phase and affect scalp-based measurements, and (b) delaying higher frequency formants can maximize EFR amplitudes for some vowels.

RevDate: 2018-11-19

Omidvar S, Mahmoudian S, Khabazkhoob M, et al (2018)

Tinnitus Impacts on Speech and Non-speech Stimuli.

Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology, 39(10):e921-e928.

OBJECTIVE: To investigate how tinnitus affects the processing of speech and non-speech stimuli at the subcortical level.

STUDY DESIGN: Cross-sectional analytical study.

SETTING: Academic, tertiary referral center.

PATIENTS: Eighteen individuals with tinnitus and 20 controls without tinnitus matched based on their age and sex. All subjects had normal hearing sensitivity.


MAIN OUTCOME MEASURES: The effect of tinnitus on the parameters of auditory brainstem responses (ABR) to non-speech (click-ABR), and speech (sABR) stimuli was investigated.

RESULTS: Latencies of click ABR in waves III, V, and Vn, as well as inter-peak latency (IPL) of I to V were significantly longer in individuals with tinnitus compared with the controls. Individuals with tinnitus demonstrated significantly longer latencies of all sABR waves than the control group. The tinnitus patients also exhibited a significant decrease in the slope of the V-A complex and reduced encoding of the first and higher formants. A significant difference was observed between the two groups in the spectral magnitudes, the first formant frequency range (F1) and a higher frequency region (HF).

CONCLUSIONS: Our findings suggest that maladaptive neural plasticity resulting from tinnitus can be subcortically measured and affects timing processing of both speech and non-speech stimuli. The findings have been discussed based on models of maladaptive plasticity and the interference of tinnitus as an internal noise in synthesizing speech auditory stimuli.

RevDate: 2018-09-21

Charlton BD, Owen MA, Keating JL, et al (2018)

Sound transmission in a bamboo forest and its implications for information transfer in giant panda (Ailuropoda melanoleuca) bleats.

Scientific reports, 8(1):12754 pii:10.1038/s41598-018-31155-5.

Although mammal vocalisations signal attributes about the caller that are important in a range of contexts, relatively few studies have investigated the transmission of specific types of information encoded in mammal calls. In this study we broadcast and re-recorded giant panda bleats in a bamboo plantation, to assess the stability of individuality and sex differences in these calls over distance, and determine how the acoustic structure of giant panda bleats degrades in this species' typical environment. Our results indicate that vocal recognition of the caller's identity and sex is not likely to be possible when the distance between the vocaliser and receiver exceeds 20 m and 10 m, respectively. Further analysis revealed that the F0 contour of bleats was subject to high structural degradation as it propagated through the bamboo canopy, making the measurement of mean F0 and F0 modulation characteristics highly unreliable at distances exceeding 10 m. The most stable acoustic features of bleats in the bamboo forest environment (lowest % variation) were the upper formants and overall formant spacing. The analysis of amplitude attenuation revealed that the fifth and sixth formant are more prone to decay than the other frequency components of bleats, however, the fifth formant still remained the most prominent and persistent frequency component over distance. Paired with previous studies, these results show that giant panda bleats have the potential to signal the caller's identity at distances of up to 20 m and reliably transmit sex differences up to 10 m from the caller, and suggest that information encoded by F0 modulation in bleats could only be functionally relevant during close-range interactions in this species' natural environment.

RevDate: 2018-11-14

Ward RM, DG Kelty-Stephen (2018)

Bringing the Nonlinearity of the Movement System to Gestural Theories of Language Use: Multifractal Structure of Spoken English Supports the Compensation for Coarticulation in Human Speech Perception.

Frontiers in physiology, 9:1152.

Coarticulation is the tendency for speech vocalization and articulation even at the phonemic level to change with context, and compensation for coarticulation (CfC) reflects the striking human ability to perceive phonemic stability despite this variability. A current controversy centers on whether CfC depends on contrast between formants of a speech-signal spectrogram-specifically, contrast between offset formants concluding context stimuli and onset formants opening the target sound-or on speech-sound variability specific to the coordinative movement of speech articulators (e.g., vocal folds, postural muscles, lips, tongues). This manuscript aims to encode that coordinative-movement context in terms of speech-signal multifractal structure and to determine whether speech's multifractal structure might explain the crucial gestural support for any proposed spectral contrast. We asked human participants to categorize individual target stimuli drawn from an 11-step [ga]-to-[da] continuum as either phonemes "GA" or "DA." Three groups each heard a specific-type context stimulus preceding target stimuli: either real-speech [al] or [a], sine-wave tones at the third-formant offset frequency of either [al] or [aɹ], and either simulated-speech contexts [al] or [aɹ]. Here, simulating speech contexts involved randomizing the sequence of relatively homogeneous pitch periods within vowel-sound [a] of each [al] and [aɹ]. Crucially, simulated-speech contexts had the same offset and extremely similar vowel formants as and, to additional naïve participants, sounded identical to real-speech contexts. However, randomization distorted original speech-context multifractality, and effects of spectral contrast following speech only appeared after regression modeling of trial-by-trial "GA" judgments controlled for context-stimulus multifractality. Furthermore, simulated-speech contexts elicited faster responses (like tone contexts do) and weakened known biases in CfC, suggesting that spectral contrast depends on the nonlinear interactions across multiple scales that articulatory gestures express through the speech signal. Traditional mouse-tracking behaviors measured as participants moved their computer-mouse cursor to register their "GA"-or-"DA" decisions with mouse-clicks suggest that listening to speech leads the movement system to resonate with the multifractality of context stimuli. We interpret these results as shedding light on a new multifractal terrain upon which to found a better understanding in which movement systems play an important role in shaping how speech perception makes use of acoustic information.

RevDate: 2018-09-19

Hu XJ, Li FF, CC Lau (2018)

Development of the Mandarin speech banana.

International journal of speech-language pathology [Epub ahead of print].

PURPOSE: For Indo-European languages, "speech banana" is widely used to verify the benefits of hearing aids and cochlear implants. As a standardised "Mandarin speech banana" is not available, clinicians in China typically use a non-Mandarin speech banana. However, as Chinese is logographic and tonal, using a non-Mandarin speech banana is inappropriate. This paper was designed to develop the Mandarin speech banana according to the Mandarin phonetic properties.

METHOD: In the first experiment, 14 participants read aloud the standard Mandarin initials and finals. For each pronounced sound, its formants were measured. The boundary of all formants formed the formant graph (intensity versus frequency). In the second experiment, 20 participants listened to a list of pre-recorded initials and finals that had been filtered with different bandwidths. The minimum bandwidth to recognise a target sound defined its location on the formant graph.

RESULT: The Mandarin speech banana was generated with recognisable initials and finals on the formant graph. Tone affected the shape of the formant graph, especially at low frequencies.

CONCLUSION: Clinicians can use the new M andarin speech banana to counsel patients about what sounds are inaudible to them. Speech training can be implemented based on the unheard sounds in the speech banana.

RevDate: 2018-11-01

Sfakianaki A, Nicolaidis K, Okalidou A, et al (2018)

Coarticulatory dynamics in Greek disyllables produced by young adults with and without hearing loss.

Clinical linguistics & phonetics, 32(12):1162-1184.

Hearing loss affects both speech perception and production with detrimental effects on various speech characteristics including coarticulatory dynamics. The aim of the present study is to explore consonant-to-vowel (C-to-V) and vowel-to-vowel (V-to-V) coarticulation in magnitude, direction and temporal extent in the speech of young adult male and female speakers of Greek with normal hearing (NH) and hearing impairment (HI). Nine intelligible speakers with profound HI, using conventional hearing aids, and five speakers with NH produced /pV1CV2/ disyllables, with the point vowels /i, a, u/ and the consonants /p, t, s/, stressed either on the first or the second syllable. Formant frequencies F1 and F2 were measured in order to examine C-to-V effects at vowel midpoint and V-to-V effects at vowel onset, midpoint and offset. The acoustic and statistical analyses revealed similarities but also significant differences regarding coarticulatory patterns of the two groups. Interestingly, prevalence of anticipatory coarticulation effects in alveolar contexts was observed for speakers with HI. Findings are interpreted on account of possible differences in articulation strategies between the two groups and with reference to current coarticulatory models.

RevDate: 2018-09-03

Kawitzky D, T McAllister (2018)

The Effect of Formant Biofeedback on the Feminization of Voice in Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30190-5 [Epub ahead of print].

Differences in formant frequencies between men and women contribute to the perception of voices as masculine or feminine. This study investigated whether visual-acoustic biofeedback can be used to help transgender women achieve formant targets typical of cisgender women, and whether such a shift influences the perceived femininity of speech. Transgender women and a comparison group of cisgender males were trained to produce vowels in a word context while also attempting to make a visual representation of their second formant (F2) line up with a target that was shifted up relative to their baseline F2 (feminized target) or an unshifted or shifted-down target (control conditions). Despite the short-term nature of the training, both groups showed significant differences in F2 frequency in shifted-up, shifted-down, and unshifted conditions. Gender typicality ratings from blinded listeners indicated that higher F2 values were associated with an increase in the perceived femininity of speech. Consistent with previous literature, we found that fundamental frequency and F2 make a joint contribution to the perception of gender. The results suggest that biofeedback might be a useful tool in voice modification therapy for transgender women; however, larger studies and information about generalization will be essential before strong conclusions can be drawn.

RevDate: 2018-08-08

Núñez-Batalla F, Vasile G, Cartón-Corona N, et al (2018)

Vowel production in hearing impaired children: A comparison between normal-hearing, hearing-aided and cochlear-implanted children.

Acta otorrinolaringologica espanola pii:S0001-6519(18)30117-1 [Epub ahead of print].

INTRODUCTION AND OBJECTIVES: Inadequate auditory feedback in prelingually deaf children alters the articulation of consonants and vowels. The purpose of this investigation was to compare vowel production in Spanish-speaking deaf children with cochlear implantation, and with hearing-aids with normal-hearing children by means of acoustic analysis of formant frequencies and vowel space.

METHODS: A total of 56 prelingually deaf children (25 with cochlear implants and 31 wearing hearing-aids) and 47 normal-hearing children participated. The first 2 formants (F1 and F2) of the five Spanish vowels were measured using Praat software. One-way analysis of variance (ANOVA) and post hoc Scheffé test were applied to analyze the differences between the 3 groups. The surface area of the vowel space was also calculated.

RESULTS: The mean value of F1 in all vowels was not significantly different between the 3 groups. For vowels /i/, /o/ and /u/, the mean value of F2 was significantly different between the 2 groups of deaf children and their normal-hearing peers.

CONCLUSION: Both prelingually hearing-impaired groups tended toward subtle deviations in the articulation of vowels that could be analyzed using an objective acoustic analysis programme.

RevDate: 2018-08-07

Bucci J, Perrier P, Gerber S, et al (2018)

Vowel Reduction in Coratino (South Italy): Phonological and Phonetic Perspectives.

Phonetica pii:000490947 [Epub ahead of print].

Vowel reduction may involve phonetic reduction processes, with nonreached targets, and/or phonological processes in which a vowel target is changed for another target, possibly schwa. Coratino, a dialect of southern Italy, displays complex vowel reduction processes assumed to be phonological. We analyzed a corpus representative of vowel reduction in Coratino, based on a set of a hundred pairs of words contrasting a stressed and an unstressed version of a given vowel in a given consonant environment, produced by 10 speakers. We report vowelformants together with consonant-to-vowel formant trajectories and durations, and show that these data are rather in agreement with a change in vowel target from /i e ɛ·ɔ u/ to schwa when the vowel is a non-word-initial unstressed utterance, unless the vowel shares a place-of-articulation feature with the preceding or following consonant. Interestingly, it also appears that there are 2 targets for phonological reduction, differing in F1 values. A "higher schwa" - which could be considered as /ɨ/ - corresponds to reduction for high vowels /i u/ while a "lower schwa" - which could be considered as /ə/ - corresponds to reduction for midhigh.

RevDate: 2018-08-04

Adriaans F (2018)

Effects of consonantal context on the learnability of vowel categories from infant-directed speech.

The Journal of the Acoustical Society of America, 144(1):EL20.

Recent studies have shown that vowels in infant-directed speech (IDS) are characterized by highly variable formant distributions. The current study investigates whether vowel variability is partially due to consonantal context, and explores whether consonantal context could support the learning of vowel categories from IDS. A computational model is presented which selects contexts based on frequency in the input and generalizes across contextual categories. Improved categorization performance was found on a vowel contrast in American-English IDS. The findings support a view in which the infant's learning mechanism is anchored in context, in order to cope with acoustic variability in the input.

RevDate: 2018-08-04

Barreda S, TM Nearey (2018)

A regression approach to vowel normalization for missing and unbalanced data.

The Journal of the Acoustical Society of America, 144(1):500.

Researchers investigating the vowel systems of languages or dialects frequently employ normalization methods to minimize between-speaker variability in formant patterns while preserving between-phoneme separation and (socio-)dialectal variation. Here two methods are considered: log-mean and Lobanov normalization. Although both of these methods express formants in a speaker-dependent space, the methods differ in their complexity and in their implied models of human vowel-perception. Typical implementations of these methods rely on balanced data across speakers so that researchers may have to reduce the data available in the analyses in missing-data situations. Here, an alternative method is proposed for the normalization of vowels using the log-mean method in a linear-regression framework. The performance of the traditional approaches to log-mean and Lobanov normalization against the regression approach to the log-mean method using naturalistic, simulated vowel-data was investigated. The results indicate that the Lobanov method likely removes legitimate linguistic variation from vowel data and often provides very noisy estimates of the actual vowel quality associated with individual tokens. The authors further argue that the Lobanov method is too complex to represent a plausible model of human vowel perception, and so is unlikely to provide results that reflect the true perceptual organization of linguistic data.

RevDate: 2018-08-04

Brajot FX, D Lawrence (2018)

Delay-induced low-frequency modulation of the voice during sustained phonation.

The Journal of the Acoustical Society of America, 144(1):282.

An important property of negative feedback systems is the tendency to oscillate when feedback is delayed. This paper evaluated this phenomenon in a sustained phonation task, where subjects prolonged a vowel with 0-600 ms delays in auditory feedback. This resulted in a delay-dependent vocal wow: from 0.4 to 1 Hz fluctuations in fundamental frequency and intensity that increased in period and amplitude as the delay increased. A similar modulation in low-frequency oscillations was not observed in the first two formant frequencies, although some subjects did display increased variability. Results suggest that delayed auditory feedback enhances an existing periodic fluctuation in the voice, with a more complex, possibly indirect, influence on supraglottal articulation. These findings have important implications for understanding how speech may be affected by artificially applied or disease-based delays in sensory feedback.

RevDate: 2018-11-14

Souza P, Wright R, Gallun F, et al (2018)

Reliability and Repeatability of the Speech Cue Profile.

Journal of speech, language, and hearing research : JSLHR, 61(8):2126-2137.

Purpose: Researchers have long noted speech recognition variability that is not explained by the pure-tone audiogram. Previous work (Souza, Wright, Blackburn, Tatman, & Gallun, 2015) demonstrated that a small number of listeners with sensorineural hearing loss utilized different types of acoustic cues to identify speechlike stimuli, specifically the extent to which the participant relied upon spectral (or temporal) information for identification. Consistent with recent calls for data rigor and reproducibility, the primary aims of this study were to replicate the pattern of cue use in a larger cohort and to verify stability of the cue profiles over time.

Method: Cue-use profiles were measured for adults with sensorineural hearing loss using a syllable identification task consisting of synthetic speechlike stimuli in which spectral and temporal dimensions were manipulated along continua. For the first set, a static spectral shape varied from alveolar to palatal, and a temporal envelope rise time varied from affricate to fricative. For the second set, formant transitions varied from labial to alveolar and a temporal envelope rise time varied from approximant to stop. A discriminant feature analysis was used to determine to what degree spectral and temporal information contributed to stimulus identification. A subset of participants completed a 2nd visit using the same stimuli and procedures.

Results: When spectral information was static, most participants were more influenced by spectral than by temporal information. When spectral information was dynamic, participants demonstrated a balanced distribution of cue-use patterns, with nearly equal numbers of individuals influenced by spectral or temporal cues. Individual cue profile was repeatable over a period of several months.

Conclusion: In combination with previously published data, these results indicate that listeners with sensorineural hearing loss are influenced by different cues to identify speechlike sounds and that those patterns are stable over time.

RevDate: 2018-07-28

Anikin A (2018)

Soundgen: An open-source tool for synthesizing nonverbal vocalizations.

Behavior research methods pii:10.3758/s13428-018-1095-7 [Epub ahead of print].

Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen (https://CRAN.R-project.org/package=soundgen) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

RevDate: 2018-09-24

Hînganu MV, Hînganu D, Cozma SR, et al (2018)

Morphofunctional evaluation of buccopharyngeal space using three-dimensional cone-beam computed tomography (3D-CBCT).

Annals of anatomy = Anatomischer Anzeiger : official organ of the Anatomische Gesellschaft, 220:1-8.

The present study aims to identify the anatomical functional changes of the buccopharyngeal space in case of singers with canto voice. The interest in this field is particularly important in view of the relation between the artistic performance level, phoniatry and functional anatomy, as the voice formation mechanism is not completely known yet. We conducted a morphometric study on three soprano voices that differ in type and training level. The anatomical soft structures from the superior vocal formant of each soprano were measured on images captured using the Cone-beam Computed Tomography (CBCT) technique. The results obtained, as well as the 3D reconstructions emphasize the particularities of the individual morphological features, especially in case of the experienced soprano soloist, which are found to be different for each anatomical soft structure, as well as for their integrity. The experimental results are encouraging and suggest further development of this study on soprano voices and also on other types of opera voices.

RevDate: 2018-11-14

Whalen DH, Chen WR, Tiede MK, et al (2018)

Variability of articulator positions and formants across nine English vowels.

Journal of phonetics, 68:1-14.

Speech, though communicative, is quite variable both in articulation and acoustics, and it has often been claimed that articulation is more variable. Here we compared variability in articulation and acoustics for 32 speakers in the x-ray microbeam database (XRMB; Westbury, 1994). Variability in tongue, lip and jaw positions for nine English vowels (/u, ʊ, æ, ɑ, ʌ, ɔ, ε, ɪ, i/) was compared to that of the corresponding formant values. The domains were made comparable by creating three-dimensional spaces for each: the first three principal components from an analysis of a 14-dimensional space for articulation, and an F1xF2xF3 space for acoustics. More variability occurred in the articulation than the acoustics for half of the speakers, while the reverse was true for the other half. Individual tokens were further from the articulatory median than the acoustic median for 40-60% of tokens across speakers. A separate analysis of three non-low front vowels (/ε, ɪ, i/, for which the XRMB system provides the most direct articulatory evidence) did not differ from the omnibus analysis. Speakers tended to be either more or less variable consistently across vowels. Across speakers, there was a positive correlation between articulatory and acoustic variability, both for all vowels and for just the three non-low front vowels. Although the XRMB is an incomplete representation of articulation, it nonetheless provides data for direct comparisons between articulatory and acoustic variability that have not been reported previously. The results indicate that articulation is not more variable than acoustics, that speakers had relatively consistent variability across vowels, and that articulatory and acoustic variability were related for the vowels themselves.

RevDate: 2018-08-09

Barakzai SZ, Wells J, Parkin TDH, et al (2018)

Overground endoscopic findings and respiratory sound analysis in horses with recurrent laryngeal neuropathy after unilateral laser ventriculocordectomy.

Equine veterinary journal [Epub ahead of print].

BACKGROUND: Unilateral ventriculocordectomy (VeC) is frequently performed, yet objective studies in horses with naturally occurring recurrent laryngeal neuropathy (RLN) are few.

OBJECTIVES: To evaluate respiratory noise and exercising overground endoscopy in horses with grade B and C laryngeal function, before and after unilateral laser VeC.

STUDY DESIGN: Prospective study in clinically affected client-owned horses.

METHODS: Exercising endoscopy was performed and concurrent respiratory noise was recorded. A left-sided laser VeC was performed under standing sedation. Owners were asked to present the horse for re-examination 6-8 weeks post-operatively when exercising endoscopy and sound recordings were repeated. Exercising endoscopic findings were recorded, including the degree of arytenoid stability. Quantitative measurement of left-to-right quotient angle ratio (LRQ) and rima glottidis area ratio (RGA) were performed pre- and post-operatively. Sound analysis was performed, and measurements of the energy change in F1, F2 and F3 formants between pre- and post-operative recordings were made and statistically analysed.

RESULTS: Three grade B and seven grade C horses were included; 6/7grade C horses preoperatively had bilateral vocal fold collapse (VFC) and 5/7 had mild right-sided medial deviation of the ary-epiglottic fold (MDAF). Right VFC and MDAF was still present in these horses post-operatively; grade B horses had no other endoscopic dynamic abnormalities post-operatively. Sound analysis showed significant reduction in energy in formant F2 (P = 0.05) after surgery.

MAIN LIMITATIONS: The study sample size was small and multiple dynamic abnormalities made sound analysis challenging.

CONCLUSIONS: RLN-affected horses have reduction in sound levels in F2 after unilateral laser VeC. Continuing noise may be caused by other ongoing forms of dynamic obstruction in grade C horses. Unilateral VeC is useful for grade B horses based on endoscopic images. In grade C horses, bilateral VeC, right ary-epiglottic fold resection ± laryngoplasty might be a better option than unilateral VeC alone. The Summary is available in Portuguese - see Supporting Information.

RevDate: 2018-11-14

Buzaneli ECP, Zenari MS, Kulcsar MAV, et al (2018)

Supracricoid Laryngectomy: The Function of the Remaining Arytenoid in Voice and Swallowing.

International archives of otorhinolaryngology, 22(3):303-312.

Introduction Supracricoid laryngectomy still has selected indications; there are few studies in the literature, and the case series are limited, a fact that stimulates the development of new studies to further elucidate the structural and functional aspects of the procedure. Objective To assess voice and deglutition parameters according to the number of preserved arytenoids. Methods Eleven patients who underwent subtotal laryngectomy with cricohyoidoepiglottopexy were evaluated by laryngeal nasofibroscopy, videofluoroscopy, and auditory-perceptual, acoustic, and voice pleasantness analyses, after resuming oral feeding. Results Functional abnormalities were detected in two out of the three patients who underwent arytenoidectomy, and in six patients from the remainder of the sample. Almost half of the sample presented silent laryngeal penetration and/or vallecular/hypopharyngeal stasis on the videofluoroscopy. The mean voice analysis scores indicated moderate vocal deviation, roughness and breathiness; severe strain and loudness deviation; shorter maximum phonation time; the presence of noise; and high third and fourth formant values. The voices were rated as unpleasant. There was no difference in the number and functionality of the remaining arytenoids as prognostic factors for deglutition; however, in the qualitative analysis, favorable voice and deglutition outcomes were more common among patients who did not undergo arytenoidectomy and had normal functional conditions. Conclusion The number and functionality of the preserved arytenoids were not found to be prognostic factors for favorable deglutition efficiency outcomes. However, the qualitative analysis showed that the preservation of both arytenoids and the absence of functional abnormalities were associated with more satisfactory voice and deglutition patterns.

RevDate: 2018-07-01

El Boghdady N, Başkent D, E Gaudrain (2018)

Effect of frequency mismatch and band partitioning on vocal tract length perception in vocoder simulations of cochlear implant processing.

The Journal of the Acoustical Society of America, 143(6):3505.

The vocal tract length (VTL) of a speaker is an important voice cue that aids speech intelligibility in multi-talker situations. However, cochlear implant (CI) users demonstrate poor VTL sensitivity. This may be partially caused by the mismatch between frequencies received by the implant and those corresponding to places of stimulation along the cochlea. This mismatch can distort formant spacing, where VTL cues are encoded. In this study, the effects of frequency mismatch and band partitioning on VTL sensitivity were investigated in normal hearing listeners with vocoder simulations of CI processing. The hypotheses were that VTL sensitivity may be reduced by increased frequency mismatch and insufficient spectral resolution in how the frequency range is partitioned, specifically where formants lie. Moreover, optimal band partitioning might mitigate the detrimental effects of frequency mismatch on VTL sensitivity. Results showed that VTL sensitivity decreased with increased frequency mismatch and reduced spectral resolution near the low frequencies of the band partitioning map. Band partitioning was independent of mismatch, indicating that if a given partitioning is suboptimal, a better partitioning might improve VTL sensitivity despite the degree of mismatch. These findings suggest that customizing the frequency partitioning map may enhance VTL perception in individual CI users.

RevDate: 2018-07-01

Vikram CM, Macha SK, Kalita S, et al (2018)

Acoustic analysis of misarticulated trills in cleft lip and palate children.

The Journal of the Acoustical Society of America, 143(6):EL474.

In this paper, acoustic analysis of misarticulated trills in cleft lip and palate speakers is carried out using excitation source based features: strength of excitation and fundamental frequency, derived from zero-frequency filtered signal, and vocal tract system features: first formant frequency (F1) and trill frequency, derived from the linear prediction analysis and autocorrelation approach, respectively. These features are found to be statistically significant while discriminating normal from misarticulated trills. Using acoustic features, dynamic time warping based trill misarticulation detection system is demonstrated. The performance of the proposed system in terms of the F1-score is 73.44%, whereas that for conventional Mel-frequency cepstral coefficients is 66.11%.

RevDate: 2018-07-17

Ng ML, Yan N, Chan V, et al (2018)

A Volumetric Analysis of the Vocal Tract Associated with Laryngectomees Using Acoustic Reflection Technology.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 70(1):44-49.

OBJECTIVE: Previous studies of the laryngectomized vocal tract using formant frequencies reported contradictory findings. Imagining studies of the vocal tract in alaryngeal speakers are limited due to the possible radiation effect as well as the cost and time associated with the studies. The present study examined the vocal tract configuration of laryngectomized individuals using acoustic reflection technology.

SUBJECTS AND METHODS: Thirty alaryngeal and 30 laryngeal male speakers of Cantonese participated in the study. A pharyngometer was used to obtain volumetric information of the vocal tract. All speakers were instructed to imitate the production of /a/ when the length and volume information of the oral cavity, pharyngeal cavity, and the entire vocal tract were obtained. The data of alaryngeal and laryngeal speakers were compared.

RESULTS: Pharyngometric measurements revealed no significant difference in the vocal tract dimensions between laryngeal and alaryngeal speakers.

CONCLUSION: Despite the removal of the larynx and a possible alteration in the pharyngeal cavity during total laryngectomy, the vocal tract configuration (length and volume) in laryngectomized individuals was not significantly different from laryngeal speakers. It is suggested that other factors might have affected formant measures in previous studies.

RevDate: 2018-09-12

Reby D, Wyman MT, Frey R, et al (2018)

Vocal tract modelling in fallow deer: are male groans nasalized?.

The Journal of experimental biology, 221(Pt 17): pii:jeb.179416.

Males of several species of deer have a descended and mobile larynx, resulting in an unusually long vocal tract, which can be further extended by lowering the larynx during call production. Formant frequencies are lowered as the vocal tract is extended, as predicted when approximating the vocal tract as a uniform quarter wavelength resonator. However, formant frequencies in polygynous deer follow uneven distribution patterns, indicating that the vocal tract configuration may in fact be rather complex. We CT-scanned the head and neck region of two adult male fallow deer specimens with artificially extended vocal tracts and measured the cross-sectional areas of the supra-laryngeal vocal tract along the oral and nasal tracts. The CT data were then used to predict the resonances produced by three possible configurations, including the oral vocal tract only, the nasal vocal tract only, or combining the two. We found that the area functions from the combined oral and nasal vocal tracts produced resonances more closely matching the formant pattern and scaling observed in fallow deer groans than those predicted by the area functions of the oral vocal tract only or of the nasal vocal tract only. This indicates that the nasal and oral vocal tracts are both simultaneously involved in the production of a non-human mammal vocalization, and suggests that the potential for nasalization in putative oral loud calls should be carefully considered.

RevDate: 2018-11-14

Yilmaz A, Sarac ET, Aydinli FE, et al (2018)

Investigating the effect of STN-DBS stimulation and different frequency settings on the acoustic-articulatory features of vowels.

Neurological sciences : official journal of the Italian Neurological Society and of the Italian Society of Clinical Neurophysiology pii:10.1007/s10072-018-3479-y [Epub ahead of print].

INTRODUCTION: Parkinson's disease (PD) is the second most frequent progressive neuro-degenerative disorder. In addition to motor symptoms, nonmotor symptoms and voice and speech disorders can also develop in 90% of PD patients. The aim of our study was to investigate the effects of DBS and different DBS frequencies on speech acoustics of vowels in PD patients.

METHODS: The study included 16 patients who underwent STN-DBS surgery due to PD. The voice recordings for the vowels including [a], [e], [i], and [o] were performed at frequencies including 230, 130, 90, and 60 Hz and off-stimulation. The voice recordings were gathered and evaluated by the Praat software, and the effects on the first (F1), second (F2), and third formant (F3) frequencies were analyzed.

RESULTS: A significant difference was found for the F1 value of the vowel [a] at 130 Hz compared to off-stimulation. However, no significant difference was found between the three formant frequencies with regard to the stimulation frequencies and off-stimulation. In addition, though not statistically significant, stimulation at 60 and 230 Hz led to several differences in the formant frequencies of other three vowels.

CONCLUSION: Our results indicated that STN-DBS stimulation at 130 Hz had a significant positive effect on articulation of [a] compared to off-stimulation. Although there is not any statistical significant stimulation at 60 and 230 Hz may also have an effect on the articulation of [e], [i], and [o] but this effect needs to be investigated in future studies with higher numbers of participants.

RevDate: 2018-11-14

Dietrich S, Hertrich I, Müller-Dahlhaus F, et al (2018)

Reduced Performance During a Sentence Repetition Task by Continuous Theta-Burst Magnetic Stimulation of the Pre-supplementary Motor Area.

Frontiers in neuroscience, 12:361.

The pre-supplementary motor area (pre-SMA) is engaged in speech comprehension under difficult circumstances such as poor acoustic signal quality or time-critical conditions. Previous studies found that left pre-SMA is activated when subjects listen to accelerated speech. Here, the functional role of pre-SMA was tested for accelerated speech comprehension by inducing a transient "virtual lesion" using continuous theta-burst stimulation (cTBS). Participants were tested (1) prior to (pre-baseline), (2) 10 min after (test condition for the cTBS effect), and (3) 60 min after stimulation (post-baseline) using a sentence repetition task (formant-synthesized at rates of 8, 10, 12, 14, and 16 syllables/s). Speech comprehension was quantified by the percentage of correctly reproduced speech material. For high speech rates, subjects showed decreased performance after cTBS of pre-SMA. Regarding the error pattern, the number of incorrect words without any semantic or phonological similarity to the target context increased, while related words decreased. Thus, the transient impairment of pre-SMA seems to affect its inhibitory function that normally eliminates erroneous speech material prior to speaking or, in case of perception, prior to encoding into a semantically/pragmatically meaningful message.

RevDate: 2018-11-14

Kent RD, HK Vorperian (2018)

Static measurements of vowel formant frequencies and bandwidths: A review.

Journal of communication disorders, 74:74-97.

PURPOSE: Data on vowel formants have been derived primarily from static measures representing an assumed steady state. This review summarizes data on formant frequencies and bandwidths for American English and also addresses (a) sources of variability (focusing on speech sample and time sampling point), and (b) methods of data reduction such as vowel area and dispersion.

METHOD: Searches were conducted with CINAHL, Google Scholar, MEDLINE/PubMed, SCOPUS, and other online sources including legacy articles and references. The primary search items were vowels, vowel space area, vowel dispersion, formants, formant frequency, and formant bandwidth.

RESULTS: Data on formant frequencies and bandwidths are available for both sexes over the lifespan, but considerable variability in results across studies affects even features of the basic vowel quadrilateral. Origins of variability likely include differences in speech sample and time sampling point. The data reveal the emergence of sex differences by 4 years of age, maturational reductions in formant bandwidth, and decreased formant frequencies with advancing age in some persons. It appears that a combination of methods of data reduction provide for optimal data interpretation.

CONCLUSION: The lifespan database on vowel formants shows considerable variability within specific age-sex groups, pointing to the need for standardized procedures.

RevDate: 2018-06-09

Horáček J, Radolf V, AM Laukkanen (2018)

Impact Stress in Water Resistance Voice Therapy: A Physical Modeling Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(17)30463-0 [Epub ahead of print].

OBJECTIVES: Phonation through a tube in water is used in voice therapy. This study investigates whether this exercise may increase mechanical loading on the vocal folds.

STUDY DESIGN: This is an experimental modeling study.

METHODS: A model with three-layer silicone vocal fold replica and a plexiglass, MK Plexi, Prague vocal tract set for the articulation of vowel [u:] was used. Impact stress (IS) was measured in three conditions: for [u:] (1) without a tube, (2) with a silicon Lax Vox tube (35 cm in length, 1 cm in inner diameter) immersed 2 cm in water, and (3) with the tube immersed 10 cm in water. Subglottic pressure and airflow ranges were selected to correspond to those reported in normal human phonation.

RESULTS: Phonation threshold pressure was lower for phonation into water compared with [u:] without a tube. IS increased with the airflow rate. IS measured in the range of subglottic pressure, which corresponds to measurements in humans, was highest for vowel [u:] without a tube and lower with the tube in water.

CONCLUSIONS: Even though the model and humans cannot be directly compared, for instance due to differences in vocal tract wall properties, the results suggest that IS is not likely to increase harmfully in water resistance therapy. However, there may be other effects related to it, possibly causing symptoms of vocal fatigue (eg, increased activity in the adductors or high amplitudes of oral pressure variation probably capable of increasing stress in the vocal fold). These need to be studied further, especially for cases where the water bubbling frequency is close to the acoustical-mechanical resonance and at the same time the fundamental phonation frequency is near the first formant frequency of the system.

RevDate: 2018-07-17

Bauerly KR (2018)

The Effects of Emotion on Second Formant Frequency Fluctuations in Adults Who Stutter.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 70(1):13-23.

OBJECTIVE: Changes in second formant frequency fluctuations (FFF2) were examined in adults who stutter (AWS) and adults who do not stutter (ANS) when producing nonwords under varying emotional conditions.

METHODS: Ten AWS and 10 ANS viewed images selected from the International Affective Picture System representing dimensions of arousal (e.g., excited versus bored) and hedonic valence (e.g., happy versus sad). Immediately following picture presentation, participants produced a consonant-vowel + final /t/ (CVt) nonword consisting of the initial sounds /p/, /b/, /s/, or /z/, followed by a vowel (/i/, /u/, /ε/) and a final /t/. CVt tokens were assessed for word duration and FFF2.

RESULTS: Significantly slower word durations were shown in the AWS compared to the ANS across conditions. Although these differences appeared to increase under arousing conditions, no interaction was found. Results for FFF2 revealed a significant group-condition interaction. Post hoc analysis indicated that this was due to the AWS showing significantly greater FFF2 when speaking under conditions eliciting increases in arousal and unpleasantness. ANS showed little change in FFF2 across conditions.

CONCLUSIONS: The results suggest that AWS' articulatory stability is more susceptible to breakdown under negative emotional influences.

RevDate: 2018-11-14

Fisher JM, Dick FK, Levy DF, et al (2018)

Neural representation of vowel formants in tonotopic auditory cortex.

NeuroImage, 178:574-582.

Speech sounds are encoded by distributed patterns of activity in bilateral superior temporal cortex. However, it is unclear whether speech sounds are topographically represented in cortex, or which acoustic or phonetic dimensions might be spatially mapped. Here, using functional MRI, we investigated the potential spatial representation of vowels, which are largely distinguished from one another by the frequencies of their first and second formants, i.e. peaks in their frequency spectra. This allowed us to generate clear hypotheses about the representation of specific vowels in tonotopic regions of auditory cortex. We scanned participants as they listened to multiple natural tokens of the vowels [ɑ] and [i], which we selected because their first and second formants overlap minimally. Formant-based regions of interest were defined for each vowel based on spectral analysis of the vowel stimuli and independently acquired tonotopic maps for each participant. We found that perception of [ɑ] and [i] yielded differential activation of tonotopic regions corresponding to formants of [ɑ] and [i], such that each vowel was associated with increased signal in tonotopic regions corresponding to its own formants. This pattern was observed in Heschl's gyrus and the superior temporal gyrus, in both hemispheres, and for both the first and second formants. Using linear discriminant analysis of mean signal change in formant-based regions of interest, the identity of untrained vowels was predicted with ∼73% accuracy. Our findings show that cortical encoding of vowels is scaffolded on tonotopy, a fundamental organizing principle of auditory cortex that is not language-specific.

RevDate: 2018-06-02

Dubey AK, Tripathi A, Prasanna SRM, et al (2018)

Detection of hypernasality based on vowel space area.

The Journal of the Acoustical Society of America, 143(5):EL412.

This study proposes a method for differentiating hypernasal-speech from normal speech using the vowel space area (VSA). Hypernasality introduces extra formant and anti-formant pairs in vowel spectrum, which results in shifting of formants. This shifting affects the size of the VSA. The results show that VSA is reduced in hypernasal-speech compared to normal speech. The VSA feature plus Mel-frequency cepstral coefficient feature for support vector machine based hypernasality detection leads to an accuracy of 86.89% for sustained vowels and 89.47%, 90.57%, and 91.70% for vowels in contexts of high pressure consonants /k/, /p/, and /t/, respectively.

RevDate: 2018-11-14

Story BH, Vorperian HK, Bunton K, et al (2018)

An age-dependent vocal tract model for males and females based on anatomic measurements.

The Journal of the Acoustical Society of America, 143(5):3079.

The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.

RevDate: 2018-06-02

Carignan C (2018)

Using ultrasound and nasalance to separate oral and nasal contributions to formant frequencies of nasalized vowels.

The Journal of the Acoustical Society of America, 143(5):2588.

The experimental method described in this manuscript offers a possible means to address a well known issue in research on the independent effects of nasalization on vowel acoustics: given that the separate transfer functions associated with the oral and nasal cavities are merged in the acoustic signal, the task of teasing apart the respective effects of the two cavities seems to be an intractable problem. The proposed method uses ultrasound and nasalance to predict the effect of lingual configuration on formant frequencies of nasalized vowels, thus accounting for acoustic variation due to changing lingual posture and excluding its contribution to the acoustic signal. The results reveal that the independent effect of nasalization on the acoustic vowel quadrilateral resembles a counter-clockwise chain shift of nasal compared to non-nasal vowels. The results from the productions of 11 vowels by six speakers of different language backgrounds are compared to predictions presented in previous modeling studies, as well as discussed in the light of sound change of nasal vowel systems.

RevDate: 2018-11-26

Romanelli S, Menegotto A, R Smyth (2018)

Stress-Induced Acoustic Variation in L2 and L1 Spanish Vowels.

Phonetica, 75(3):190-218.

AIM: We assessed the effect of lexical stress on the duration and quality of Spanish word-final vowels /a, e, o/ produced by American English late intermediate learners of L2 Spanish, as compared to those of native L1 Argentine Spanish speakers.

METHODS: Participants read 54 real words ending in /a, e, o/, with either final or penultimate lexical stress, embedded in a text and a word list. We measured vowel duration and both F1 and F2 frequencies at 3 temporal points.

RESULTS: stressed vowels were longer than unstressed vowels, in Spanish L1 and L2. L1 and L2 Spanish stressed /a/ and /e/ had higher F1 values than their unstressed counterparts. Only the L2 speakers showed evidence of rising offglides for /e/ and /o/. The L2 and L1 Spanish vowel space was compressed in the absence of stress.

CONCLUSION: Lexical stress affected the vowel quality of L1 and L2 Spanish vowels. We provide an up-to-date account of the formant trajectories of Argentine River Plate Spanish word-final /a, e, o/ and offer experimental support to the claim that stress affects the quality of Spanish vowels in word-final contexts.

RevDate: 2018-05-25

Peter V, Kalashnikova M, D Burnham (2018)

Weighting of Amplitude and Formant Rise Time Cues by School-Aged Children: A Mismatch Negativity Study.

Journal of speech, language, and hearing research : JSLHR, 61(5):1322-1333.

Purpose: An important skill in the development of speech perception is to apply optimal weights to acoustic cues so that phonemic information is recovered from speech with minimum effort. Here, we investigated the development of acoustic cue weighting of amplitude rise time (ART) and formant rise time (FRT) cues in children as measured by mismatch negativity (MMN).

Method: Twelve adults and 36 children aged 6-12 years listened to a /ba/-/wa/ contrast in an oddball paradigm in which the standard stimulus had the ART and FRT cues of /ba/. In different blocks, the deviant stimulus had either the ART or FRT cues of /wa/.

Results: The results revealed that children younger than 10 years were sensitive to both ART and FRT cues whereas 10- to 12-year-old children and adults were sensitive only to FRT cues. Moreover, children younger than 10 years generated a positive mismatch response, whereas older children and adults generated MMN.

Conclusion: These results suggest that preattentive adultlike weighting of ART and FRT cues is attained only by 10 years of age and accompanies the change from mismatch response to the more mature MMN response.

Supplemental Material: https://doi.org/10.23641/asha.6207608.

RevDate: 2018-11-14

Redford MA (2018)

Grammatical Word Production Across Metrical Contexts in School-Aged Children's and Adults' Speech.

Journal of speech, language, and hearing research : JSLHR, 61(6):1339-1354.

Purpose: The purpose of this study is to test whether age-related differences in grammatical word production are due to differences in how children and adults chunk speech for output or to immature articulatory timing control in children.

Method: Two groups of 12 children, 5 and 8 years old, and 1 group of 12 adults produced sentences with phrase-medial determiners. Preceding verbs were varied to create different metrical contexts for chunking the determiner with an adjacent content word. Following noun onsets were varied to assess the coherence of determiner-noun sequences. Determiner vowel duration, amplitude, and formant frequencies were measured.

Results: Children produced significantly longer and louder determiners than adults regardless of metrical context. The effect of noun onset on F1 was stronger in children's speech than in adults' speech; the effect of noun onset on F2 was stronger in adults' speech than in children's. Effects of metrical context on anticipatory formant patterns were more evident in children's speech than in adults' speech.

Conclusion: The results suggest that both immature articulatory timing control and age-related differences in how chunks are accessed or planned influence grammatical word production in school-aged children's speech. Future work will focus on the development of long-distance coarticulation to reveal the evolution of speech plan structure over time.

RevDate: 2018-05-24

Dugan SH, Silbert N, McAllister T, et al (2018)

Modelling category goodness judgments in children with residual sound errors.

Clinical linguistics & phonetics [Epub ahead of print].

This study investigates category goodness judgments of /r/ in adults and children with and without residual speech errors (RSEs) using natural speech stimuli. Thirty adults, 38 children with RSE (ages 7-16) and 35 age-matched typically developing (TD) children provided category goodness judgments on whole words, recorded from 27 child speakers, with /r/ in various phonetic environments. The salient acoustic property of /r/ - the lowered third formant (F3) - was normalized in two ways. A logistic mixed-effect model quantified the relationships between listeners' responses and the third formant frequency, vowel context and clinical group status. Goodness judgments from the adult group showed a statistically significant interaction with the F3 parameter when compared to both child groups (p < 0.001) using both normalization methods. The RSE group did not differ significantly from the TD group in judgments of /r/. All listeners were significantly more likely to judge /r/ as correct in a front-vowel context. Our results suggest that normalized /r/ F3 is a statistically significant predictor of category goodness judgments for both adults and children, but children do not appear to make adult-like judgments. Category goodness judgments do not have a clear relationship with /r/ production abilities in children with RSE. These findings may have implications for clinical activities that include category goodness judgments in natural speech, especially for recorded productions.

RevDate: 2018-11-14
CmpDate: 2018-08-20

Tai HC, Shen YP, Lin JH, et al (2018)

Acoustic evolution of old Italian violins from Amati to Stradivari.

Proceedings of the National Academy of Sciences of the United States of America, 115(23):5926-5931.

The shape and design of the modern violin are largely influenced by two makers from Cremona, Italy: The instrument was invented by Andrea Amati and then improved by Antonio Stradivari. Although the construction methods of Amati and Stradivari have been carefully examined, the underlying acoustic qualities which contribute to their popularity are little understood. According to Geminiani, a Baroque violinist, the ideal violin tone should "rival the most perfect human voice." To investigate whether Amati and Stradivari violins produce voice-like features, we recorded the scales of 15 antique Italian violins as well as male and female singers. The frequency response curves are similar between the Andrea Amati violin and human singers, up to ∼4.2 kHz. By linear predictive coding analyses, the first two formants of the Amati exhibit vowel-like qualities (F1/F2 = 503/1,583 Hz), mapping to the central region on the vowel diagram. Its third and fourth formants (F3/F4 = 2,602/3,731 Hz) resemble those produced by male singers. Using F1 to F4 values to estimate the corresponding vocal tract length, we observed that antique Italian violins generally resemble basses/baritones, but Stradivari violins are closer to tenors/altos. Furthermore, the vowel qualities of Stradivari violins show reduced backness and height. The unique formant properties displayed by Stradivari violins may represent the acoustic correlate of their distinctive brilliance perceived by musicians. Our data demonstrate that the pioneering designs of Cremonese violins exhibit voice-like qualities in their acoustic output.

RevDate: 2018-05-21

Niemczak CE, KR Vander Werff (2018)

Informational Masking Effects on Neural Encoding of Stimulus Onset and Acoustic Change.

Ear and hearing [Epub ahead of print].

OBJECTIVE: Recent investigations using cortical auditory evoked potentials have shown masker-dependent effects on sensory cortical processing of speech information. Background noise maskers consisting of other people talking are particularly difficult for speech recognition. Behavioral studies have related this to perceptual masking, or informational masking, beyond just the overlap of the masker and target at the auditory periphery. The aim of the present study was to use cortical auditory evoked potentials, to examine how maskers (i.e., continuous speech-shaped noise [SSN] and multi-talker babble) affect the cortical sensory encoding of speech information at an obligatory level of processing. Specifically, cortical responses to vowel onset and formant change were recorded under different background noise conditions presumed to represent varying amounts of energetic or informational masking. The hypothesis was, that even at this obligatory cortical level of sensory processing, we would observe larger effects on the amplitude and latency of the onset and change components as the amount of informational masking increased across background noise conditions.

DESIGN: Onset and change responses were recorded to a vowel change from /u-i/ in young adults under four conditions: quiet, continuous SSN, eight-talker (8T) babble, and two-talker (2T) babble. Repeated measures analyses by noise condition were conducted on amplitude, latency, and response area measurements to determine the differential effects of these noise conditions, designed to represent increasing and varying levels of informational and energetic masking, on cortical neural representation of a vowel onset and acoustic change response waveforms.

RESULTS: All noise conditions significantly reduced onset N1 and P2 amplitudes, onset N1-P2 peak to peak amplitudes, as well as both onset and change response area compared with quiet conditions. Further, all amplitude and area measures were significantly reduced for the two babble conditions compared with continuous SSN. However, there were no significant differences in peak amplitude or area for either onset or change responses between the two different babble conditions (eight versus two talkers). Mean latencies for all onset peaks were delayed for noise conditions compared with quiet. However, in contrast to the amplitude and area results, differences in peak latency between SSN and the babble conditions did not reach statistical significance.

CONCLUSIONS: These results support the idea that while background noise maskers generally reduce amplitude and increase latency of speech-sound evoked cortical responses, the type of masking has a significant influence. Speech babble maskers (eight talkers and two talkers) have a larger effect on the obligatory cortical response to speech sound onset and change compared with purely energetic continuous SSN maskers, which may be attributed to informational masking effects. Neither the neural responses to the onset nor the vowel change, however, were sensitive to the hypothesized increase in the amount of informational masking between speech babble maskers with two talkers compared with eight talkers.

RevDate: 2018-11-02

Sóskuthy M, Foulkes P, Hughes V, et al (2018)

Changing Words and Sounds: The Roles of Different Cognitive Units in Sound Change.

Topics in cognitive science, 10(4):787-802.

This study considers the role of different cognitive units in sound change: phonemes, contextual variants and words. We examine /u/-fronting and /j/-dropping in data from three generations of Derby English speakers. We analyze dynamic formant data and auditory judgments, using mixed effects regression methods, including generalized additive mixed models (GAMMs). /u/-fronting is reaching its end-point, showing complex conditioning by context and a frequency effect that weakens over time. /j/-dropping is declining, with low-frequency words showing more innovative variants with /j/ than high-frequency words. The two processes interact: words with variable /j/-dropping (new) exhibit more fronting than words that never have /j/ (noodle) even when the /j/ is deleted. These results support models of change that rely on phonetically detailed representations for both word- and sound-level cognitive units.

RevDate: 2018-05-16

Sanfins MD, Hatzopoulos S, Donadon C, et al (2018)

An Analysis of The Parameters Used In Speech ABR Assessment Protocols.

The journal of international advanced otology, 14(1):100-105.

The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.

RevDate: 2018-05-15

Chen Y, Wang J, Chen W, et al (2017)

[Research on spectrum feature of speech processing strategy for cochlear implant].

Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi, 34(5):760-766.

Cochlear implant (CI) in present Chinese environment will lose pitch information and result in low speech recognition. In order to research Chinese feature-based speech processing strategy for cochlear implant contrapuntally and to improve the speech recognition for CI recipients, we improve the CI front-end signal acquisition platform and research the signal features. Our search includes the waveform, spectrogram, energy intensity, pitch and formant parameters for different speech processing strategies of cochlear implant. Features in two kinds of speech processing strategies are analyzed and extracted for the study of parameter characteristics. Therefore, the proposed aim of this paper is to extend the research on Chinese-based CI speech processing strategy.

RevDate: 2018-05-10

Wang Q, Bai J, Xue P, et al (2018)

[An acoustic-articulatory study of the nasal finals in students with and without hearing loss].

Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi, 35(2):198-205.

The central aim of this experiment was to compare the articulatory and acoustic characteristics of students with normal hearing (NH) and school aged children with hearing loss (HL), and to explore the articulatory-acoustic relations during the nasal finals. Fourteen HL and 10 control group were enrolled in this study, and the data of 4 HL students were removed because of their high pronunciation error rate. Data were collected using an electromagnetic articulography. The acoustic data and kinematics data of nasal finals were extracted by the phonetics and data processing software, and all data were analyzed by t test and correlation analysis. The paper shows that, the difference was statistically significant (P<0.05 or P<0.01) in different vowels under the first two formant frequencies (F1, F2), the tongue position and the articulatory-acoustic relations between HL and NH group. The HL group's vertical movement data-F1 relations in /en/ and /eng/ are same as NH group. The conclusion of this study about participants with HL can provide support for speech healing training at increasing pronunciation accuracy in HL participants.

RevDate: 2018-05-07

Lee Y, Kim G, Wang S, et al (2018)

Acoustic Characteristics in Epiglottic Cyst.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(17)30529-5 [Epub ahead of print].

OBJECTIVE: The purpose of this study was to analyze the acoustic characteristics associated with alternation deformation of the vocal tract due to large epiglottic cyst, and to confirm the relation between the anatomical change and resonant function of the vocal tract.

METHODS: Eight men with epiglottic cyst were enrolled in this study. The jitter, shimmer, noise-to-harmonic ratio, and first two formants were analyzed in vowels /a:/, /e:/, /i:/, /o:/, and /u:/. These values were analyzed before and after laryngeal microsurgery.

RESULTS: The F1 value of /a:/ was significantly raised after surgery. Significant differences of formant frequencies in other vowels, jitter, shimmer, and noise-to-harmonic ratio were not presented.

CONCLUSION: The results of this study could be used to analyze changes in the resonance of vocal tracts due to the epiglottic cysts.

RevDate: 2018-05-25

Whitfield JA, Dromey C, P Palmer (2018)

Examining Acoustic and Kinematic Measures of Articulatory Working Space: Effects of Speech Intensity.

Journal of speech, language, and hearing research : JSLHR, 61(5):1104-1117.

Purpose: The purpose of this study was to examine the effect of speech intensity on acoustic and kinematic vowel space measures and conduct a preliminary examination of the relationship between kinematic and acoustic vowel space metrics calculated from continuously sampled lingual marker and formant traces.

Method: Young adult speakers produced 3 repetitions of 2 different sentences at 3 different loudness levels. Lingual kinematic and acoustic signals were collected and analyzed. Acoustic and kinematic variants of several vowel space metrics were calculated from the formant frequencies and the position of 2 lingual markers. Traditional metrics included triangular vowel space area and the vowel articulation index. Acoustic and kinematic variants of sentence-level metrics based on the articulatory-acoustic vowel space and the vowel space hull area were also calculated.

Results: Both acoustic and kinematic variants of the sentence-level metrics significantly increased with an increase in loudness, whereas no statistically significant differences in traditional vowel-point metrics were observed for either the kinematic or acoustic variants across the 3 loudness conditions. In addition, moderate-to-strong relationships between the acoustic and kinematic variants of the sentence-level vowel space metrics were observed for the majority of participants.

Conclusions: These data suggest that both kinematic and acoustic vowel space metrics that reflect the dynamic contributions of both consonant and vowel segments are sensitive to within-speaker changes in articulation associated with manipulations of speech intensity.

RevDate: 2018-11-14

DiNino M, JG Arenberg (2018)

Age-Related Performance on Vowel Identification and the Spectral-temporally Modulated Ripple Test in Children With Normal Hearing and With Cochlear Implants.

Trends in hearing, 22:2331216518770959.

Children's performance on psychoacoustic tasks improves with age, but inadequate auditory input may delay this maturation. Cochlear implant (CI) users receive a degraded auditory signal with reduced frequency resolution compared with normal, acoustic hearing; thus, immature auditory abilities may contribute to the variation among pediatric CI users' speech recognition scores. This study investigated relationships between age-related variables, spectral resolution, and vowel identification scores in prelingually deafened, early-implanted children with CIs compared with normal hearing (NH) children. All participants performed vowel identification and the Spectral-temporally Modulated Ripple Test (SMRT). Vowel stimuli for NH children were vocoded to simulate the reduced spectral resolution of CI hearing. Age positively predicted NH children's vocoded vowel identification scores, but time with the CI was a stronger predictor of vowel recognition and SMRT performance of children with CIs. For both groups, SMRT thresholds were related to vowel identification performance, analogous to previous findings in adults. Sequential information analysis of vowel feature perception indicated greater transmission of duration-related information compared with formant features in both groups of children. In addition, the amount of F2 information transmitted predicted SMRT thresholds in children with NH and with CIs. Comparisons between the two CIs of bilaterally implanted children revealed disparate task performance levels and information transmission values within the same child. These findings indicate that adequate auditory experience contributes to auditory perceptual abilities of pediatric CI users. Further, factors related to individual CIs may be more relevant to psychoacoustic task performance than are the overall capabilities of the child.

RevDate: 2018-04-27

Chiaramonte R, Di Luciano C, Chiaramonte I, et al (2018)

Multi-disciplinary clinical protocol for the diagnosis of bulbar amyotrophic lateral sclerosis.

Acta otorrinolaringologica espanola pii:S0001-6519(18)30056-6 [Epub ahead of print].

INTRODUCTION AND OBJECTIVES: The objective of this study was to examine the role of different specialists in the diagnosis of amyotrophic lateral sclerosis (ALS), to understand changes in verbal expression and phonation, respiratory dynamics and swallowing that occurred rapidly over a short period of time.

MATERIALS AND METHODS: 22 patients with bulbar ALS were submitted for voice assessment, ENT evaluation, Multi-Dimensional Voice Program (MDVP), spectrogram, electroglottography, fiberoptic endoscopic evaluation of swallowing.

RESULTS: In the early stage of the disease, the oral tract and velopharyngeal port were involved. Three months after the initial symptoms, most of the patients presented hoarseness, breathy voice, dysarthria, pitch modulation problems and difficulties in pronunciation of explosive, velar and lingual consonants. Values of MDVP were altered. Spectrogram showed an additional formant, due to nasal resonance. Electroglottography showed periodic oscillation of the vocal folds only during short vocal cycle. Swallowing was characterized by weakness and incoordination of oro-pharyngeal muscles with penetration or aspiration.

CONCLUSIONS: A specific multidisciplinary clinical protocol was designed to report vocal parameters and swallowing disorders that changed more quickly in bulbar ALS patients. Furthermore, the patients were stratified according to involvement of pharyngeal structures, and severity index.

RevDate: 2018-10-01

Prévost F, A Lehmann (2018)

Saliency of Vowel Features in Neural Responses of Cochlear Implant Users.

Clinical EEG and neuroscience, 49(6):388-397.

Cochlear implants restore hearing in deaf individuals, but speech perception remains challenging. Poor discrimination of spectral components is thought to account for limitations of speech recognition in cochlear implant users. We investigated how combined variations of spectral components along two orthogonal dimensions can maximize neural discrimination between two vowels, as measured by mismatch negativity. Adult cochlear implant users and matched normal-hearing listeners underwent electroencephalographic event-related potentials recordings in an optimum-1 oddball paradigm. A standard /a/ vowel was delivered in an acoustic free field along with stimuli having a deviant fundamental frequency (+3 and +6 semitones), a deviant first formant making it a /i/ vowel or combined deviant fundamental frequency and first formant (+3 and +6 semitones /i/ vowels). Speech recognition was assessed with a word repetition task. An analysis of variance between both amplitude and latency of mismatch negativity elicited by each deviant vowel was performed. The strength of correlations between these parameters of mismatch negativity and speech recognition as well as participants' age was assessed. Amplitude of mismatch negativity was weaker in cochlear implant users but was maximized by variations of vowels' first formant. Latency of mismatch negativity was later in cochlear implant users and was particularly extended by variations of the fundamental frequency. Speech recognition correlated with parameters of mismatch negativity elicited by the specific variation of the first formant. This nonlinear effect of acoustic parameters on neural discrimination of vowels has implications for implant processor programming and aural rehabilitation.

RevDate: 2018-06-20

Hamdan AL, Khandakji M, AT Macari (2018)

Maxillary arch dimensions associated with acoustic parameters in prepubertal children.

The Angle orthodontist, 88(4):410-415.

OBJECTIVES: To evaluate the association between maxillary arch dimensions and fundamental frequency and formants of voice in prepubertal subjects.

MATERIALS AND METHODS: Thirty-five consecutive prepubertal patients seeking orthodontic treatment were recruited (mean age = 11.41 ± 1.46 years; range, 8 to 13.7 years). Participants with a history of respiratory infection, laryngeal manipulation, dysphonia, congenital facial malformations, or history of orthodontic treatment were excluded. Dental measurements included maxillary arch length, perimeter, depth, and width. Voice parameters comprising fundamental frequency (f0_sustained), Habitual pitch (f0_count), Jitter, Shimmer, and different formant frequencies (F1, F2, F3, and F4) were measured using acoustic analysis prior to initiation of any orthodontic treatment. Pearson's correlation coefficients were used to measure the strength of associations between different dental and voice parameters. Multiple linear regressions were computed for the predictions of different dental measurements.

RESULTS: Arch width and arch depth had moderate significant negative correlations with f0 (r = -0.52; P = .001 and r = -0.39; P = .022, respectively) and with habitual frequency (r = -0.51; P = .0014 and r = -0.34; P = .04, respectively). Arch depth and arch length were significantly correlated with formant F3 and formant F4, respectively. Predictors of arch depth included frequencies of F3 vowels, with a significant regression equation (P-value < .001; R2 = 0.49). Similarly, fundamental frequency f0 and frequencies of formant F3 vowels were predictors of arch width, with a significant regression equation (P-value < .001; R2 = 0.37).

CONCLUSIONS: There is a significant association between arch dimensions, particularly arch length and depth, and voice parameters. The formant most predictive of arch depth and width is the third formant, along with fundamental frequency of voice.

RevDate: 2018-11-14

Elgendi M, Bobhate P, Jain S, et al (2018)

The Voice of the Heart: Vowel-Like Sound in Pulmonary Artery Hypertension.

Diseases (Basel, Switzerland), 6(2): pii:diseases6020026.

Increased blood pressure in the pulmonary artery is referred to as pulmonary hypertension and often is linked to loud pulmonic valve closures. For the purpose of this paper, it was hypothesized that pulmonary circulation vibrations will create sounds similar to sounds created by vocal cords during speech and that subjects with pulmonary artery hypertension (PAH) could have unique sound signatures across four auscultatory sites. Using a digital stethoscope, heart sounds were recorded at the cardiac apex, 2nd left intercostal space (2LICS), 2nd right intercostal space (2RICS), and 4th left intercostal space (4LICS) undergoing simultaneous cardiac catheterization. From the collected heart sounds, relative power of the frequency band, energy of the sinusoid formants, and entropy were extracted. PAH subjects were differentiated by applying the linear discriminant analysis with leave-one-out cross-validation. The entropy of the first sinusoid formant decreased significantly in subjects with a mean pulmonary artery pressure (mPAp) &ge; 25 mmHg versus subjects with a mPAp < 25 mmHg with a sensitivity of 84% and specificity of 88.57%, within a 10-s optimized window length for heart sounds recorded at the 2LICS. First sinusoid formant entropy reduction of heart sounds in PAH subjects suggests the existence of a vowel-like pattern. Pattern analysis revealed a unique sound signature, which could be used in non-invasive screening tools.

RevDate: 2018-11-14

Brumberg JS, Pitt KM, JD Burnison (2018)

A Noninvasive Brain-Computer Interface for Real-Time Speech Synthesis: The Importance of Multimodal Feedback.

IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society, 26(4):874-881.

We conducted a study of a motor imagery brain-computer interface (BCI) using electroencephalography to continuously control a formant frequency speech synthesizer with instantaneous auditory and visual feedback. Over a three-session training period, sixteen participants learned to control the BCI for production of three vowel sounds (/ textipa i/ [heed], / textipa A/ [hot], and / textipa u/ [who'd]) and were split into three groups: those receiving unimodal auditory feedback of synthesized speech, those receiving unimodal visual feedback of formant frequencies, and those receiving multimodal, audio-visual (AV) feedback. Audio feedback was provided by a formant frequency artificial speech synthesizer, and visual feedback was given as a 2-D cursor on a graphical representation of the plane defined by the first two formant frequencies. We found that combined AV feedback led to the greatest performance in terms of percent accuracy, distance to target, and movement time to target compared with either unimodal feedback of auditory or visual information. These results indicate that performance is enhanced when multimodal feedback is meaningful for the BCI task goals, rather than as a generic biofeedback signal of BCI progress.

RevDate: 2018-04-10

Li G, Li H, Hou Q, et al (2018)

Distinct Acoustic Features and Glottal Changes Define Two Modes of Singing in Peking Opera.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(17)30355-7 [Epub ahead of print].

OBJECTIVE: We aimed to delineate the acoustic characteristics of the Laodan and Qingyi role in Peking Opera and define glottis closure states and mucosal wave changes during singing in the two roles.

METHODS: The range of singing in A4 (440 Hz) pitch in seven female Peking Opera singers was determined using two classic pieces of Peking Opera. Glottal changes during singing were examined by stroboscopic laryngoscope. The fundamental frequency of /i/ in the first 15 seconds of the two pieces and the /i/ pitch range were determined. The relative length of the glottis fissure and the relative maximum mucosal amplitude were calculated.

RESULTS: Qingyi had significantly higher mean fundamental frequency than Laodan. The long-term average spectrum showed an obvious formant cluster near 3000 Hz in Laodan versus Qingyi. No formant cluster was observed in singing in the regular mode. Strobe laryngoscopy showed complete glottal closure in Laodan and incomplete glottal closure in Qingyi in the maximal glottis closure phase. The relative length of the glottis fissure of Laodan was significantly lower than that of Qingyi in the singing mode. The relative maximum mucosal amplitude of Qingyi was significantly lower than that of Laodan.

CONCLUSION: The Laodan role and the Qingyi role in Peking Opera sing in a fundamental frequency range compatible with the respective use of da sang (big voice) and xiao sang (small voice). The morphological patterns of glottal changes also indicate that the Laodan role and the Qingyi role sing with da sang and xiao sang, respectively.

RevDate: 2018-11-14

Brajot FX, Nguyen D, DiGiovanni J, et al (2018)

The impact of perilaryngeal vibration on the self-perception of loudness and the Lombard effect.

Experimental brain research, 236(6):1713-1723.

The role of somatosensory feedback in speech and the perception of loudness was assessed in adults without speech or hearing disorders. Participants completed two tasks: loudness magnitude estimation of a short vowel and oral reading of a standard passage. Both tasks were carried out in each of three conditions: no-masking, auditory masking alone, and mixed auditory masking plus vibration of the perilaryngeal area. A Lombard effect was elicited in both masking conditions: speakers unconsciously increased vocal intensity. Perilaryngeal vibration further increased vocal intensity above what was observed for auditory masking alone. Both masking conditions affected fundamental frequency and the first formant frequency as well, but only vibration was associated with a significant change in the second formant frequency. An additional analysis of pure-tone thresholds found no difference in auditory thresholds between masking conditions. Taken together, these findings indicate that perilaryngeal vibration effectively masked somatosensory feedback, resulting in an enhanced Lombard effect (increased vocal intensity) that did not alter speakers' self-perception of loudness. This implies that the Lombard effect results from a general sensorimotor process, rather than from a specific audio-vocal mechanism, and that the conscious self-monitoring of speech intensity is not directly based on either auditory or somatosensory feedback.

RevDate: 2018-04-01

Lawson E, Stuart-Smith J, JM Scobbie (2018)

The role of gesture delay in coda /r/ weakening: An articulatory, auditory and acoustic study.

The Journal of the Acoustical Society of America, 143(3):1646.

The cross-linguistic tendency of coda consonants to weaken, vocalize, or be deleted is shown to have a phonetic basis, resulting from gesture reduction, or variation in gesture timing. This study investigates the effects of the timing of the anterior tongue gesture for coda /r/ on acoustics and perceived strength of rhoticity, making use of two sociolects of Central Scotland (working- and middle-class) where coda /r/ is weakening and strengthening, respectively. Previous articulatory analysis revealed a strong tendency for these sociolects to use different coda /r/ tongue configurations-working- and middle-class speakers tend to use tip/front raised and bunched variants, respectively; however, this finding does not explain working-class /r/ weakening. A correlational analysis in the current study showed a robust relationship between anterior lingual gesture timing, F3, and percept of rhoticity. A linear mixed effects regression analysis showed that both speaker social class and linguistic factors (word structure and the checked/unchecked status of the prerhotic vowel) had significant effects on tongue gesture timing and formant values. This study provides further evidence that gesture delay can be a phonetic mechanism for coda rhotic weakening and apparent loss, but social class emerges as the dominant factor driving lingual gesture timing variation.

RevDate: 2018-05-01

Waaramaa T, Kukkonen T, Mykkänen S, et al (2018)

Vocal Emotion Identification by Children Using Cochlear Implants, Relations to Voice Quality, and Musical Interests.

Journal of speech, language, and hearing research : JSLHR, 61(4):973-985.

Purpose: Listening tests for emotion identification were conducted with 8-17-year-old children with hearing impairment (HI; N = 25) using cochlear implants, and their 12-year-old peers with normal hearing (N = 18). The study examined the impact of musical interests and acoustics of the stimuli on correct emotion identification.

Method: The children completed a questionnaire with their background information and noting musical interests. They then listened to vocal stimuli produced by actors (N = 5) and consisting of nonsense sentences and prolonged vowels ([a:], [i:], and [u:]; N = 32) expressing excitement, anger, contentment, and fear. The children's task was to identify the emotions they heard in the sample by choosing from the provided options. Acoustics of the samples were studied using Praat software, and statistics were examined using SPSS 24 software.

Results: The children with HI identified the emotions with 57% accuracy and the normal hearing children with 75% accuracy. Female listeners were more accurate than male listeners in both groups. Those who were implanted before age of 3 years identified emotions more accurately than others (p < .05). No connection between the child's audiogram and correct identification was observed. Musical interests and voice quality parameters were found to be related to correct identification.

Conclusions: Implantation age, musical interests, and voice quality tended to have an impact on correct emotion identification. Thus, in developing the cochlear implants, it may be worth paying attention to the acoustic structures of vocal emotional expressions, especially the formant frequency of F3. Supporting the musical interests of children with HI may help their emotional development and improve their social lives.

RevDate: 2018-03-23

de Andrade BMR, Valença EHO, Salvatori R, et al (2018)

Effects of Therapy With Semi-occluded Vocal Tract and Choir Training on Voice in Adult Individuals With Congenital, Isolated, Untreated Growth Hormone Deficiency.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30006-7 [Epub ahead of print].

OBJECTIVES: Voice is produced by the vibration of the vocal folds expressed by its fundamental frequency (Hz), whereas the formants (F) are fundamental frequency multiples, indicating amplification zones of the vowels in the vocal tract. We have shown that lifetime isolated growth hormone deficiency (IGHD) causes high pitch voice, with higher values of most formant frequencies, maintaining a prepuberal acoustic prediction. The objectives of this work were to verify the effects of the therapy with a semi-occluded vocal tract (SOVTT) and choir training on voice in these subjects with IGHD. We speculated that acoustic vocal parameters can be improved by SOVTT or choir training.

STUDY DESIGN: This is a prospective longitudinal study without control group.

METHODS: Acoustic analysis of isolated vowels was performed in 17 adults with IGHD before and after SOVTT (pre-SOVTT and post-SOVTT) and after choir training (post training), in a 30-day period.

RESULTS: The first formant was higher in post training compared with the pre-SOVTT (P = 0.009). The second formant was higher in post-SOVTT than in pre-SOVTT (P = 0.045). There was a trend of reduction in shimmer in post-choir training in comparison with pre-SOVTT (P = 0.051), and a reduction in post-choir training in comparison with post-SOVTT (P = 0.047).

CONCLUSIONS: SOVTT was relevant to the second formant, whereas choir training improved first formant and shimmer. Therefore, this speech therapy approach was able to improve acoustic parameters of the voice of individuals with congenital, untreated IGHD. This seems particularly important in a scenario in which few patients are submitted to growth hormone replacement therapy.

RevDate: 2018-11-14

Masapollo M, Polka L, Ménard L, et al (2018)

Asymmetries in unimodal visual vowel perception: The roles of oral-facial kinematics, orientation, and configuration.

Journal of experimental psychology. Human perception and performance, 44(7):1103-1118.

Masapollo, Polka, and Ménard (2017) recently reported a robust directional asymmetry in unimodal visual vowel perception: Adult perceivers discriminate a change from an English /u/ viseme to a French /u/ viseme significantly better than a change in the reverse direction. This asymmetry replicates a frequent pattern found in unimodal auditory vowel perception that points to a universal bias favoring more extreme vocalic articulations, which lead to acoustic signals with increased formant convergence. In the present article, the authors report 5 experiments designed to investigate whether this asymmetry in the visual realm reflects a speech-specific or general processing bias. They successfully replicated the directional effect using Masapollo et al.'s dynamically articulating faces but failed to replicate the effect when the faces were shown under static conditions. Asymmetries also emerged during discrimination of canonically oriented point-light stimuli that retained the kinematics and configuration of the articulating mouth. In contrast, no asymmetries emerged during discrimination of rotated point-light stimuli or Lissajou patterns that retained the kinematics, but not the canonical orientation or spatial configuration, of the labial gestures. These findings suggest that the perceptual processes underlying asymmetries in unimodal visual vowel discrimination are sensitive to speech-specific motion and configural properties and raise foundational questions concerning the role of specialized and general processes in vowel perception. (PsycINFO Database Record

RevDate: 2018-03-16

Tamura S, Ito K, Hirose N, et al (2018)

Psychophysical Boundary for Categorization of Voiced-Voiceless Stop Consonants in Native Japanese Speakers.

Journal of speech, language, and hearing research : JSLHR, 61(3):789-796.

Purpose: The purpose of this study was to investigate the psychophysical boundary used for categorization of voiced-voiceless stop consonants in native Japanese speakers.

Method: Twelve native Japanese speakers participated in the experiment. The stimuli were synthetic stop consonant-vowel stimuli varying in voice onset time (VOT) with manipulation of the amplitude of the initial noise portion and the first formant (F1) frequency of the periodic portion. There were 3 tasks, namely, speech identification to either /d/ or /t/, detection of the noise portion, and simultaneity judgment of onsets of the noise and periodic portions.

Results: The VOT boundaries of /d/-/t/ were close to the shortest VOT values that allowed for detection of the noise portion but not to those for perceived nonsimultaneity of the noise and periodic portions. The slopes of noise detection functions along VOT were as sharp as those of voiced-voiceless identification functions. In addition, the effects of manipulating the amplitude of the noise portion and the F1 frequency of the periodic portion on the detection of the noise portion were similar to those on voiced-voiceless identification.

Conclusion: The psychophysical boundary of perception of the initial noise portion masked by the following periodic portion may be used for voiced-voiceless categorization by Japanese speakers.

RevDate: 2018-03-02

Roberts B, RJ Summers (2018)

Informational masking of speech by time-varying competitors: Effects of frequency region and number of interfering formants.

The Journal of the Acoustical Society of America, 143(2):891.

This study explored the extent to which informational masking of speech depends on the frequency region and number of extraneous formants in an interferer. Target formants-monotonized three-formant (F1+F2+F3) analogues of natural sentences-were presented monaurally, with target ear assigned randomly on each trial. Interferers were presented contralaterally. In experiment 1, single-formant interferers were created using the time-reversed F2 frequency contour and constant amplitude, root-mean-square (RMS)-matched to F2. Interferer center frequency was matched to that of F1, F2, or F3, while maintaining the extent of formant-frequency variation (depth) on a log scale. Adding an interferer lowered intelligibility; the effect of frequency region was small and broadly tuned around F2. In experiment 2, interferers comprised either one formant (F1, the most intense) or all three, created using the time-reversed frequency contours of the corresponding targets and RMS-matched constant amplitudes. Interferer formant-frequency variation was scaled to 0%, 50%, or 100% of the original depth. Increasing the depth of formant-frequency variation and number of formants in the interferer had independent and additive effects. These findings suggest that the impact on intelligibility depends primarily on the overall extent of frequency variation in each interfering formant (up to ∼100% depth) and the number of extraneous formants.

RevDate: 2018-03-02

Barreda S, ZY Liu (2018)

Apparent-talker height is influenced by Mandarin lexical tone.

The Journal of the Acoustical Society of America, 143(2):EL61.

Apparent-talker height is determined by a talker's fundamental frequency (f0) and spectral information, typically indexed using formant frequencies (FFs). Barreda [(2017b). J. Acoust. Soc. Am. 141, 4781-4792] reports that the apparent height of a talker can be influenced by vowel-specific variation in the f0 or FFs of a sound. In this experiment, native speakers of Mandarin were presented with a series of syllables produced by talkers of different apparent heights. Results indicate that there is substantial variability in the estimated height of a single talker based on lexical tone, as well as the inherent f0 and FFs of vowel phonemes.

RevDate: 2018-03-16

Croake DJ, Andreatta RD, JC Stemple (2018)

Vocalization Subsystem Responses to a Temporarily Induced Unilateral Vocal Fold Paralysis.

Journal of speech, language, and hearing research : JSLHR, 61(3):479-495.

Purpose: The purpose of this study is to quantify the interactions of the 3 vocalization subsystems of respiration, phonation, and resonance before, during, and after a perturbation to the larynx (temporarily induced unilateral vocal fold paralysis) in 10 vocally healthy participants. Using dynamic systems theory as a guide, we hypothesized that data groupings would emerge revealing context-dependent patterns in the relationships of variables representing the 3 vocalization subsystems. We also hypothesized that group data would mask important individual variability important to understanding the relationships among the vocalization subsystems.

Method: A perturbation paradigm was used to obtain respiratory kinematic, aerodynamic, and acoustic formant measures from 10 healthy participants (8 women, 2 men) with normal voices. Group and individual data were analyzed to provide a multilevel analysis of the data. A 3-dimensional state space model was constructed to demonstrate the interactive relationships among the 3 subsystems before, during, and after perturbation.

Results: During perturbation, group data revealed that lung volume initiations and terminations were lower, with longer respiratory excursions; airflow rates increased while subglottic pressures were maintained. Acoustic formant measures indicated that the spacing between the upper formants decreased (F3-F5), whereas the spacing between F1 and F2 increased. State space modeling revealed the changing directionality and interactions among the 3 subsystems.

Conclusions: Group data alone masked important variability necessary to understand the unique relationships among the 3 subsystems. Multilevel analysis permitted a richer understanding of the individual differences in phonatory regulation and permitted subgroup analysis. Dynamic systems theory may be a useful heuristic to model the interactive relationships among vocalization subsystems.

Supplemental Material: https://doi.org/10.23641/asha.5913532.

RevDate: 2018-11-13

Compton MT, Lunden A, Cleary SD, et al (2018)

The aprosody of schizophrenia: Computationally derived acoustic phonetic underpinnings of monotone speech.

Schizophrenia research pii:S0920-9964(18)30027-6 [Epub ahead of print].

OBJECTIVE: Acoustic phonetic methods are useful in examining some symptoms of schizophrenia; we used such methods to understand the underpinnings of aprosody. We hypothesized that, compared to controls and patients without clinically rated aprosody, patients with aprosody would exhibit reduced variability in: pitch (F0), jaw/mouth opening and tongue height (formant F1), tongue front/back position and/or lip rounding (formant F2), and intensity/loudness.

METHODS: Audiorecorded speech was obtained from 98 patients (including 25 with clinically rated aprosody and 29 without) and 102 unaffected controls using five tasks: one describing a drawing, two based on spontaneous speech elicited through a question (Tasks 2 and 3), and two based on reading prose excerpts (Tasks 4 and 5). We compared groups on variation in pitch (F0), formant F1 and F2, and intensity/loudness.

RESULTS: Regarding pitch variation, patients with aprosody differed significantly from controls in Task 5 in both unadjusted tests and those adjusted for sociodemographics. For the standard deviation (SD) of F1, no significant differences were found in adjusted tests. Regarding SD of F2, patients with aprosody had lower values than controls in Task 3, 4, and 5. For variation in intensity/loudness, patients with aprosody had lower values than patients without aprosody and controls across the five tasks.

CONCLUSIONS: Findings could represent a step toward developing new methods for measuring and tracking the severity of this specific negative symptom using acoustic phonetic parameters; such work is relevant to other psychiatric and neurological disorders.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Order from Amazon

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

21454 NE 143rd Street
Woodinville, WA 98077


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )