About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

16 Nov 2019 at 01:40
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 16 Nov 2019 at 01:40 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

RevDate: 2019-11-12

Rosenthal MA (2019)

A systematic review of the voice-tagging hypothesis of speech-in-noise perception.

Neuropsychologia pii:S0028-3932(19)30299-4 [Epub ahead of print].

The voice-tagging hypothesis claims that individuals who better represent pitch information in a speaker's voice, as measured with the frequency following response (FFR), will be better at speech-in-noise perception. The hypothesis has been provided to explain how music training might improve speech-in-noise perception. This paper reviews studies that are relevant to the voice-tagging hypothesis, including studies on musicians and nonmusicians. Most studies on musicians show greater f0 amplitude compared to controls. Most studies on nonmusicians do not show group differences in f0 amplitude. Across all studies reviewed, f0 amplitude does not consistently predict accuracy in speech-in-noise perception. The evidence suggests that music training does not improve speech-in-noise perception via enhanced subcortical representation of the f0.

RevDate: 2019-11-11

Hakanpää T, Waaramaa T, AM Laukkanen (2019)

Comparing Contemporary Commercial and Classical Styles: Emotion Expression in Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30209-7 [Epub ahead of print].

OBJECTIVE: This study examines the acoustic correlates of the vocal expression of emotions in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing.

STUDY DESIGN: This is an experimental comparative study.

METHODS: Eleven female singers with a minimum of 3 years of professional-level singing training in CCM, classical, or both styles participated. They sang the vowel [ɑ:] at three pitches (A3 220Hz, E4 330Hz, and A4 440Hz) expressing anger, sadness, joy, tenderness, and a neutral voice. Vowel samples were analyzed for fundamental frequency (fo) formant frequencies (F1-F5), sound pressure level (SPL), spectral structure (alpha ratio = SPL 1500-5000 Hz-SPL 50-1500 Hz), harmonics-to-noise ratio (HNR), perturbation (jitter, shimmer), onset and offset duration, sustain time, rate and extent of fo variation in vibrato, and rate and extent of amplitude vibrato.

RESULTS: The parameters that were statistically significantly (RM-ANOVA, P ≤ 0.05) related to emotion expression in both genres were SPL, alpha ratio, F1, and HNR. Additionally, for CCM, significance was found in sustain time, jitter, shimmer, F2, and F4. When fo and SPL were set as covariates in the variance analysis, jitter, HNR, and F4 did not show pure dependence on expression. The alpha ratio, F1, F2, shimmer apq5, amplitude vibrato rate, and sustain time of vocalizations had emotion-related variation also independent of fo and SPL in the CCM style, while these parameters were related to fo and SPL in the classical style.

CONCLUSIONS: The results differed somewhat for the CCM and classical styles. The alpha ratio showed less variation in the classical style, most likely reflecting the demand for a more stable voice source quality. The alpha ratio, F1, F2, shimmer, amplitude vibrato rate, and the sustain time of the vocalizations were related to fo and SPL control in the classical style. The only common independent sound parameter indicating emotional expression for both styles was SPL. The CCM style offers more freedom for expression-related changes in voice quality.

RevDate: 2019-11-06

Weirich M, A Simpson (2019)

Effects of Gender, Parental Role, and Time on Infant- and Adult-Directed Read and Spontaneous Speech.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose The study sets out to investigate inter- and intraspeaker variation in German infant-directed speech (IDS) and considers the potential impact that the factors gender, parental involvement, and speech material (read vs. spontaneous speech) may have. In addition, we analyze data from 3 time points prior to and after the birth of the child to examine potential changes in the features of IDS and, particularly also, of adult-directed speech (ADS). Here, the gender identity of a speaker is considered as an additional factor. Method IDS and ADS data from 34 participants (15 mothers, 19 fathers) is gathered by means of a reading and a picture description task. For IDS, 2 recordings were made when the baby was approximately 6 and 9 months old, respectively. For ADS, an additional recording was made before the baby was born. Phonetic analyses comprise mean fundamental frequency (f0), variation in f0, the 1st 2 formants measured in /i: ɛ a u:/, and the vowel space size. Moreover, social and behavioral data were gathered regarding parental involvement and gender identity. Results German IDS is characterized by an increase in mean f0, a larger variation in f0, vowel- and formant-specific differences, and a larger acoustic vowel space. No effect of gender or parental involvement was found. Also, the phonetic features of IDS were found in both spontaneous and read speech. Regarding ADS, changes in vowel space size in some of the fathers and in mean f0 in mothers were found. Conclusion Phonetic features of German IDS are robust with respect to the factors gender, parental involvement, speech material (read vs. spontaneous speech), and time. Some phonetic features of ADS changed within the child's first year depending on gender and parental involvement/gender identity. Thus, further research on IDS needs to address also potential changes in ADS.

RevDate: 2019-10-30

Howson PJ, MA Redford (2019)

Liquid coarticulation in child and adult speech.

Proceedings of the ... International Congress of Phonetic Sciences. International Congress of Phonetic Sciences, 2019:3100-3104.

Although liquids are mastered late, English-speaking children are said to have fully acquired these segments by age 8. The aim of this study was to test whether liquid coarticulation was also adult-like by this age. 8-year-old productions of /əLa/ and /əLu/ sequences were compared to 5-year-old and adult productions of these sequences. SSANOVA analyses of formant frequency trajectories indicated that, while adults contrasted rhotics and laterals from the onset of the vocalic sequence, F2 trajectories for rhotics and lateral were overlapped at the onset of the /əLa/ sequence in 8-year-old productions and across the entire /əLu/ sequence. The F2 trajectories for rhotics and laterals were even more overlapped in 5-year olds' productions. Overall, the study suggests that whereas younger children have difficulty coordinating the tongue body/root gesture with the tongue tip gesture, older children still struggle with the intergestural timing associated with liquid production.

RevDate: 2019-10-29

Kim D, S Kim (2019)

Coarticulatory vowel nasalization in American English: Data of individual differences in acoustic realization of vowel nasalization as a function of prosodic prominence and boundary.

Data in brief, 27:104593 pii:104593.

This article provides acoustic measurements data for vowel nasalization which are based on speech recorded from fifteen (8 female and 7 male) native speakers of American English in a laboratory setting. Each individual speaker's production patterns for the vowel nasalization in tautosyllabic CVN and NVC words are documented in terms of three acoustic parameters: the duration of nasal consonant (N-Duration), the duration of vowel (V-Duration) and the difference between the amplitude of the first formant (A1) and the first nasal peak (P0) obtained from the vowel (A1-P0) as an indication of the degree of vowel nasalization. The A1-P0 is measured at three different time points within the vowel -i.e., the near point (25%), midpoint (50%), and distant point (75%), either from the onset (CVN) or the offset (NVC) of the nasal consonant. These measures are taken from the target words in various prosodic prominence and boundary contexts: phonologically focused (PhonFOC) vs. lexically focused (LexFOC) vs. unfocused (NoFOC) conditions; phrase-edge (i.e., phrase-final for CVN and phrase-initial for NVC) vs. phrase-medial conditions. The data also contain a CSV file with each speaker's mean values of the N-Duration, V-Duration, and A1-P0 (z-scored) for each prosodic context along with the information about the speakers' gender. For further discussion of the data, please refer to the full-length article entitled "Prosodically-conditioned fine-tuning of coarticulatory vowel nasalization in English"(Cho et al., 2017).

RevDate: 2019-10-29

Goswami U, Nirmala SR, Vikram CM, et al (2019)

Analysis of Articulation Errors in Dysarthric Speech.

Journal of psycholinguistic research pii:10.1007/s10936-019-09676-5 [Epub ahead of print].

Imprecise articulation is the major issue reported in various types of dysarthria. Detection of articulation errors can help in diagnosis. The cues derived from both the burst and the formant transitions contribute to the discrimination of place of articulation of stops. It is believed that any acoustic deviations in stops due to articulation error can be analyzed by deriving features around the burst and the voicing onsets. The derived features can be used to discriminate the normal and dysarthric speech. In this work, a method is proposed to differentiate the voiceless stops produced by the normal speakers from the dysarthric by deriving the spectral moments, two-dimensional discrete cosine transform of linear prediction spectrum and Mel frequency cepstral coefficients features. These features and cosine distance based classifier is used for the classification of normal and dysarthic speech.

RevDate: 2019-10-23

Cartei V, Banerjee R, Garnham A, et al (2019)

Physiological and perceptual correlates of masculinity in children's voices.

Hormones and behavior pii:S0018-506X(19)30277-6 [Epub ahead of print].

Low frequency components (i.e. a low pitch (F0) and low formant spacing (ΔF)) signal high salivary testosterone and height in adult male voices and are associated with high masculinity attributions by unfamiliar listeners (in both men and women). However, the relation between the physiological, acoustic and perceptual dimensions of speakers' masculinity prior to puberty remains unknown. In this study, 110 pre-pubertal children (58 girls), aged 3 to 10, were recorded as they described a cartoon picture. 315 adults (182 women) rated children's perceived masculinity from the voice only after listening to the speakers' audio recordings. On the basis of their voices alone, boys who had higher salivary testosterone levels were rated as more masculine and the relation between testosterone and perceived masculinity was partially mediated by F0. The voices of taller boys were also rated as more masculine, but the relation between height and perceived masculinity was not mediated by the considered acoustic parameters, indicating that acoustic cues other than F0 and ΔF may signal stature. Both boys and girls who had lower F0, were also rated as more masculine, while ΔF did not affect ratings. These findings highlight the interdependence of physiological, acoustic and perceptual dimensions, and suggest that inter-individual variation in male voices, particularly F0, may advertise hormonal masculinity from a very early age.

RevDate: 2019-10-17

Scheerer NE, Jacobson DS, JA Jones (2019)

Sensorimotor control of vocal production in early childhood.

Journal of experimental psychology. General pii:2019-62257-001 [Epub ahead of print].

Children maintain fluent speech despite dramatic changes to their articulators during development. Auditory feedback aids in the acquisition and maintenance of the sensorimotor mechanisms that underlie vocal motor control. MacDonald, Johnson, Forsythe, Plante, and Munhall (2012) reported that toddlers' speech motor control systems may "suppress" the influence of auditory feedback, since exposure to altered auditory feedback regarding their formant frequencies did not lead to modifications of their speech. This finding is not parsimonious with most theories of motor control. Here, we exposed toddlers to perturbations to the pitch of their auditory feedback as they vocalized. Toddlers compensated for the manipulations, producing significantly different responses to upward and downward perturbations. These data represent the first empirical demonstration that toddlers use auditory feedback for vocal motor control. Furthermore, our findings suggest toddlers are more sensitive to changes to the postural properties of their auditory feedback, such as fundamental frequency, relative to the phonemic properties, such as formant frequencies. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

RevDate: 2019-10-08

Conklin JT, O Dmitrieva (2019)

Vowel-to-Vowel Coarticulation in Spanish Nonwords.

Phonetica pii:000502890 [Epub ahead of print].

The present study examined vowel-to-vowel (VV) coarticulation in backness affecting mid vowels /e/ and /o/ in 36 Spanish nonwords produced by 20 native speakers of Spanish, aged 19-50 years (mean = 30.7; SD = 8.2). Examination of second formant frequency showed substantial carryover coarticulation throughout the data set, while anticipatory coarticulation was minimal and of shorter duration. Furthermore, the effect of stress on vowel-to-vowel coarticulation was investigated and found to vary by direction. In the anticipatory direction, small coarticulatory changes were relatively stable regardless of stress, particularly for target /e/, while in the carryover direction, a hierarchy of stress emerged wherein the greatest coarticulation occurred between stressed triggers and unstressed targets, less coarticulation was observed between unstressed triggers and unstressed targets, and the least coarticulation occurred between unstressed triggers with stressed targets. The results of the study augment and refine previously available knowledge about vowel-to-vowel coarticulation in Spanish and expand cross-linguistic understanding of the effect of stress on the magnitude and direction of vowel-to-vowel coarticulation.

RevDate: 2019-10-08

Lee Y, Keating P, J Kreiman (2019)

Acoustic voice variation within and between speakers.

The Journal of the Acoustical Society of America, 146(3):1568.

Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

RevDate: 2019-10-01

Balaguer M, Pommée T, Farinas J, et al (2019)

Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review.

Head & neck [Epub ahead of print].

BACKGROUND: The development of automatic tools based on acoustic analysis allows to overcome the limitations of perceptual assessment for patients with head and neck cancer. The aim of this study is to provide a systematic review of literature describing the effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis.

METHODS: Two databases (PubMed and Embase) were surveyed. The selection process, according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement, led to a final set of 22 articles.

RESULTS: Nasalance is studied mainly in oropharyngeal patients. The vowels are mostly studied using formant analysis and vowel space area, the consonants by means of spectral moments with specific parameters according to their phonetic characteristic. Machine learning methods allow classifying "intelligible" or "unintelligible" speech for T3 or T4 tumors.

CONCLUSIONS: The development of comprehensive models combining different acoustic measures would allow a better consideration of the functional impact of the speech disorder.

RevDate: 2019-09-23

Zeng Q, Jiao Y, Huang X, et al (2019)

Effects of Angle of Epiglottis on Aerodynamic and Acoustic Parameters in Excised Canine Larynges.

Journal of voice : official journal of the Voice Foundation, 33(5):627-633.

OBJECTIVES: The aim of this study is to explore the effects of the angle of epiglottis (Aepi) on phonation and resonance in excised canine larynges.

METHODS: The anatomic Aepi was measured for 14 excised canine larynges as a control. Then, the Aepis were manually adjusted to 60° and 90° in each larynx. Aerodynamic and acoustic parameters, including mean flow rate, sound pressure level, jitter, shimmer, fundamental frequency (F0), and formants (F1'-F4'), were measured with a subglottal pressure of 1.5 kPa. Simple linear regression analysis between acoustic and aerodynamic parameters and the Aepi of the control was performed, and an analysis of variance comparing the acoustic and aerodynamic parameters of the three treatments was carried out.

RESULTS: The results of the study are as follows: (1) the larynges with larger anatomic Aepi had significantly lower jitter, shimmer, formant 1, and formant 2; (2) phonation threshold flow was significantly different for the three treatments; and (3) mean flow rate and sound pressure level were significantly different between the 60° and the 90° treatments of the 14 larynges.

CONCLUSIONS: The Aepi was proposed for the first time in this study. The Aepi plays an important role in phonation and resonance of excised canine larynges.

RevDate: 2019-09-18

Dmitrieva O, I Dutta (2019)

Acoustic Correlates of the Four-Way Laryngeal Contrast in Marathi.

Phonetica pii:000501673 [Epub ahead of print].

The study examines acoustic correlates of the four-way laryngeal contrast in Marathi, focusing on temporal parameters, voice quality, and onset f0. Acoustic correlates of the laryngeal contrast were investigated in the speech of 33 native speakers of Marathi, recorded in Mumbai, India, producing a word list containing six sets of words minimally contrastive in terms of laryngeal specification of word-initial velar stops. Measurements were made for the duration of prevoicing, release, and voicing during release. Fundamental frequency was measured at the onset of voicing following the stop and at 10 additional time points. As measures of voice quality, amplitude differences between the first and second harmonic (H1-H2) and between the first harmonic and the third formant (H1-A3) were calculated. The results demonstrated that laryngeal categories in Marathi are differentiated based on temporal measures, voice quality, and onset f0, although differences in each dimension were unequal in magnitude across different pairs of stop categories. We conclude that a single acoustic correlate, such as voice onset time, is insufficient to differentiate among all the laryngeal categories in languages such as Marathi, characterized by complex four-way laryngeal contrasts. Instead, a joint contribution of several acoustic correlates creates a robust multidimensional contrast.

RevDate: 2019-09-03

Guan J, C Liu (2019)

Speech Perception in Noise With Formant Enhancement for Older Listeners.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to explore whether 2nd formant (F2) enhancement improves speech perception in noise for older listeners with hearing impairment (HI) and normal hearing (NH). Method Target words (e.g., color and digit) were selected and presented based on the paradigm of the coordinate response measure corpus. Speech recognition thresholds with original and F2-enhanced speech in 2- and 6-talker babble were examined for older listeners with NH and HI. Results The thresholds for both the NH and HI groups improved for enhanced speech signals primarily in 2-talker babble, but not in 6-talker babble. The F2 enhancement benefits did not correlate significantly with listeners' age and their average hearing thresholds in most listening conditions. However, speech intelligibility index values increased significantly with F2 enhancement in babble for listeners with HI, but not for NH listeners. Conclusions Speech sounds with F2 enhancement may improve listeners' speech perception in 2-talker babble, possibly due to a greater amount of speech information available in temporally modulated noise or a better capacity to separate speech signals from background babble.

RevDate: 2019-09-01

Klein E, Brunner J, P Hoole (2019)

The influence of coarticulatory and phonemic relations on individual compensatory formant production.

The Journal of the Acoustical Society of America, 146(2):1265.

Previous auditory perturbation studies have shown that speakers are able to simultaneously use multiple compensatory strategies to produce a certain acoustic target. In the case of formant perturbation, these findings were obtained examining the compensatory production for low vowels /ɛ/ and /æ/. This raises some controversy as more recent research suggests that the contribution of the somatosensory feedback to the production of vowels might differ across phonemes. In particular, the compensatory magnitude to auditory perturbations is expected to be weaker for high vowels compared to low vowels since the former are characterized by larger linguopalatal contact. To investigate this hypothesis, this paper conducted a bidirectional auditory perturbation study in which F2 of the high central vowel /ɨ/ was perturbed in opposing directions depending on the preceding consonant (alveolar vs velar). The consonants were chosen such that speakers' usual coarticulatory patterns were either compatible or incompatible with the required compensatory strategy. The results demonstrate that speakers were able to compensate for applied perturbations even if speakers' compensatory movements resulted in unusual coarticulatory configurations. However, the results also suggest that individual compensatory patterns were influenced by additional perceptual factors attributable to the phonemic space surrounding the target vowel /ɨ/.

RevDate: 2019-09-01

Migimatsu K, IT Tokuda (2019)

Experimental study on nonlinear source-filter interaction using synthetic vocal fold models.

The Journal of the Acoustical Society of America, 146(2):983.

Under certain conditions, e.g., singing voice, the fundamental frequency of the vocal folds can go up and interfere with the formant frequencies. Acoustic feedback from the vocal tract filter to the vocal fold source then becomes strong and non-negligible. An experimental study was presented on such source-filter interaction using three types of synthetic vocal fold models. Asymmetry was also created between the left and right vocal folds. The experiment reproduced various nonlinear phenomena, such as frequency jump and quenching, as reported in humans. Increase in phonation threshold pressure was also observed when resonant frequency of the vocal tract and fundamental frequency of the vocal folds crossed each other. As a combined effect, the phonation threshold pressure was further increased by the left-right asymmetry. Simulation of the asymmetric two-mass model reproduced the experiments to some extent. One of the intriguing findings of this study is the variable strength of the source-filter interaction over different model types. Among the three models, two models were strongly influenced by the vocal tract, while no clear effect of the vocal tract was observed in the other model. This implies that the level of source-filter interaction may vary considerably from one subject to another in humans.

RevDate: 2019-08-29

Max L, A Daliri (2019)

Limited Pre-Speech Auditory Modulation in Individuals Who Stutter: Data and Hypotheses.

Journal of speech, language, and hearing research : JSLHR, 62(8S):3071-3084.

Purpose We review and interpret our recent series of studies investigating motor-to-auditory influences during speech movement planning in fluent speakers and speakers who stutter. In those studies, we recorded auditory evoked potentials in response to probe tones presented immediately prior to speaking or at the equivalent time in no-speaking control conditions. As a measure of pre-speech auditory modulation (PSAM), we calculated changes in auditory evoked potential amplitude in the speaking conditions relative to the no-speaking conditions. Whereas adults who do not stutter consistently showed PSAM, this phenomenon was greatly reduced or absent in adults who stutter. The same between-group difference was observed in conditions where participants expected to hear their prerecorded speech played back without actively producing it, suggesting that the speakers who stutter use inefficient forward modeling processes rather than inefficient motor command generation processes. Compared with fluent participants, adults who stutter showed both less PSAM and less auditory-motor adaptation when producing speech while exposed to formant-shifted auditory feedback. Across individual participants, however, PSAM and auditory-motor adaptation did not correlate in the typically fluent group, and they were negatively correlated in the stuttering group. Interestingly, speaking with a consistent 100-ms delay added to the auditory feedback signal-normalized PSAM in speakers who stutter, and there no longer was a between-group difference in this condition. Conclusions Combining our own data with human and animal neurophysiological evidence from other laboratories, we interpret the overall findings as suggesting that (a) speech movement planning modulates auditory processing in a manner that may optimize its tuning characteristics for monitoring feedback during speech production and, (b) in conditions with typical auditory feedback, adults who stutter do not appropriately modulate the auditory system prior to speech onset. Lack of modulation of speakers who stutter may lead to maladaptive feedback-driven movement corrections that manifest themselves as repetitive movements or postural fixations.

RevDate: 2019-08-26

Plummer AR, PF Reidy (2018)

Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses.

Journal of phonetics, 71:355-375.

Low-dimensional representations of speech data, such as formant values extracted by linear predictive coding analysis or spectral moments computed from whole spectra viewed as probability distributions, have been instrumental in both phonetic and phonological analyses over the last few decades. In this paper, we present a framework for computing low-dimensional representations of speech data based on two assumptions: that speech data represented in high-dimensional data spaces lie on shapes called manifolds that can be used to map speech data to low-dimensional coordinate spaces, and that manifolds underlying speech data are generated from a combination of language-specific lexical, phonological, and phonetic information as well as culture-specific socio-indexical information that is expressed by talkers of a given speech community. We demonstrate the basic mechanics of the framework by carrying out an analysis of children's productions of sibilant fricatives relative to those of adults in their speech community using the phoneigen package - a publicly available implementation of the framework. We focus the demonstration on enumerating the steps for constructing manifolds from data and then using them to map the data to a low-dimensional space, explicating how manifold structure affects the learned low-dimensional representations, and comparing the use of these representations against standard acoustic features in a phonetic analysis. We conclude with a discussion of the framework's underlying assumptions, its broader modeling potential, and its position relative to recent advances in the field of representation learning.

RevDate: 2019-08-16

Jain S, NP Nataraja (2019)

The Relationship between Temporal Integration and Temporal Envelope Perception in Noise by Males with Mild Sensorineural Hearing Loss.

The journal of international advanced otology, 15(2):257-262.

OBJECTIVES: A surge of literature indicated that temporal integration and temporal envelope perception contribute largely to the perception of speech. A review of literature showed that the perception of speech with temporal integration and temporal envelope perception in noise might be affected due to sensorineural hearing loss but to a varying degree. Because the temporal integration and temporal envelope share similar physiological processing at the cochlear level, the present study was aimed to identify the relationship between temporal integration and temporal envelope perception in noise by individuals with mild sensorineural hearing loss.

MATERIALS AND METHODS: Thirty adult males with mild sensorineural hearing loss and thirty age- and gender-matched normal-hearing individuals volunteered for being the participants of the study. The temporal integration was measured using synthetic consonant-vowel-consonant syllables, varied for onset, offset, and onset-offset of second and third formant frequencies of the vowel following and preceding consonants in six equal steps, thus forming a six-step onset, offset, and onset-offset continuum, each. The duration of the transition was kept short (40 ms) in one set of continua and long (80 ms) in another. Temporal integration scores were calculated as the differences in the identification of the categorical boundary between short- and long-transition continua. Temporal envelope perception was measured using sentences processed in quiet, 0 dB, and -5 dB signal-to-noise ratios at 4, 8, 16, and 32 contemporary frequency channels, and the temporal envelope was extracted for each sentence using the Hilbert transformation.

RESULTS: A significant effect of hearing loss was observed on temporal integration, but not on temporal envelope perception. However, when the temporal integration abilities were controlled, the variable effect of hearing loss on temporal envelope perception was noted.

CONCLUSION: It was important to measure the temporal integration to accurately account for the envelope perception by individuals with normal hearing and those with hearing loss.

RevDate: 2019-08-18

Cartei V, Garnham A, Oakhill J, et al (2019)

Children can control the expression of masculinity and femininity through the voice.

Royal Society open science, 6(7):190656 pii:rsos190656.

Pre-pubertal boys and girls speak with acoustically different voices despite the absence of a clear anatomical dimorphism in the vocal apparatus, suggesting that a strong component of the expression of gender through the voice is behavioural. Initial evidence for this hypothesis was found in a previous study showing that children can alter their voice to sound like a boy or like a girl. However, whether they can spontaneously modulate these voice components within their own gender in order to vary the expression of their masculinity and femininity remained to be investigated. Here, seventy-two English-speaking children aged 6-10 were asked to give voice to child characters varying in masculine and feminine stereotypicality to investigate whether primary school children spontaneously adjust their sex-related cues in the voice-fundamental frequency (F0) and formant spacing (ΔF)-along gender stereotypical lines. Boys and girls masculinized their voice, by lowering F0 and ΔF, when impersonating stereotypically masculine child characters of the same sex. Girls and older boys also feminized their voice, by raising their F0 and ΔF, when impersonating stereotypically feminine same-sex child characters. These findings reveal that children have some knowledge of the sexually dimorphic acoustic cues underlying the expression of gender, and are capable of controlling them to modulate gender-related attributes, paving the way for the use of the voice as an implicit, objective measure of the development of gender stereotypes and behaviour.

RevDate: 2019-08-15

Dorman MF, Natale SC, Zeitler DM, et al (2019)

Looking for Mickey Mouse™ But Finding a Munchkin: The Perceptual Effects of Frequency Upshifts for Single-Sided Deaf, Cochlear Implant Patients.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Our aim was to make audible for normal-hearing listeners the Mickey Mouse™ sound quality of cochlear implants (CIs) often found following device activation. Method The listeners were 3 single-sided deaf patients fit with a CI and who had 6 months or less of CI experience. Computed tomography imaging established the location of each electrode contact in the cochlea and allowed an estimate of the place frequency of the tissue nearest each electrode. For the most apical electrodes, this estimate ranged from 650 to 780 Hz. To determine CI sound quality, a clean signal (a sentence) was presented to the CI ear via a direct connect cable and candidate, and CI-like signals were presented to the ear with normal hearing via an insert receiver. The listeners rated the similarity of the candidate signals to the sound of the CI on a 1- to 10-point scale, with 10 being a complete match. Results To make the match to CI sound quality, all 3 patients need an upshift in formant frequencies (300-800 Hz) and a metallic sound quality. Two of the 3 patients also needed an upshift in voice pitch (10-80 Hz) and a muffling of sound quality. Similarity scores ranged from 8 to 9.7. Conclusion The formant frequency upshifts, fundamental frequency upshifts, and metallic sound quality experienced by the listeners can be linked to the relatively basal locations of the electrode contacts and short duration experience with their devices. The perceptual consequence was not the voice quality of Mickey Mouse™ but rather that of Munchkins in The Wizard of Oz for whom both formant frequencies and voice pitch were upshifted. Supplemental Material https://doi.org/10.23641/asha.9341651.

RevDate: 2019-08-10

Knight EJ, SF Austin (2019)

The Effect of Head Flexion/Extension on Acoustic Measures of Singing Voice Quality.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30117-1 [Epub ahead of print].

A study was undertaken to identify the effect of head flexion/extension on singing voice quality. The amplitude of the fundamental frequency (F0) and the singing power ratio (SPR), an indirect measure of Singer's Formant activity, were measured. F0 and SPR scores at four experimental head positions were compared with the subjects' scores at their habitual positions. Three vowels and three pitch levels were tested. F0 amplitudes and low-frequency partials in general were greater with neck extension, while SPR increased with neck flexion. No effect of pitch or vowel was found. Gains in SPR appear to be the result of damping low-frequency partials rather than amplifying those in the Singer's Formant region. Raising the amplitude of F0 is an important resonance tool for female voices in the high range, and may be of benefit to other voice types in resonance, loudness, and laryngeal function.

RevDate: 2019-08-08

Alho K, Żarnowiec K, Gorina-Careta N, et al (2019)

Phonological Task Enhances the Frequency-Following Response to Deviant Task-Irrelevant Speech Sounds.

Frontiers in human neuroscience, 13:245.

In electroencephalography (EEG) measurements, processing of periodic sounds in the ascending auditory pathway generates the frequency-following response (FFR) phase-locked to the fundamental frequency (F0) and its harmonics of a sound. We measured FFRs to the steady-state (vowel) part of syllables /ba/ and /aw/ occurring in binaural rapid streams of speech sounds as frequently repeating standard syllables or as infrequent (p = 0.2) deviant syllables among standard /wa/ syllables. Our aim was to study whether concurrent active phonological processing affects early processing of irrelevant speech sounds reflected by FFRs to these sounds. To this end, during syllable delivery, our healthy adult participants performed tasks involving written letters delivered on a computer screen in a rapid stream. The stream consisted of vowel letters written in red, infrequently occurring consonant letters written in the same color, and infrequently occurring vowel letters written in blue. In the phonological task, the participants were instructed to press a response key to the consonant letters differing phonologically but not in color from the frequently occurring red vowels, whereas in the non-phonological task, they were instructed to respond to the vowel letters written in blue differing only in color from the frequently occurring red vowels. We observed that the phonological task enhanced responses to deviant /ba/ syllables but not responses to deviant /aw/ syllables. This suggests that active phonological task performance may enhance processing of such small changes in irrelevant speech sounds as the 30-ms difference in the initial formant-transition time between the otherwise identical syllables /ba/ and /wa/ used in the present study.

RevDate: 2019-08-02

Birkholz P, Gabriel F, Kürbis S, et al (2019)

How the peak glottal area affects linear predictive coding-based formant estimates of vowels.

The Journal of the Acoustical Society of America, 146(1):223.

The estimation of formant frequencies from acoustic speech signals is mostly based on Linear Predictive Coding (LPC) algorithms. Since LPC is based on the source-filter model of speech production, the formant frequencies obtained are often implicitly regarded as those for an infinite glottal impedance, i.e., a closed glottis. However, previous studies have indicated that LPC-based formant estimates of vowels generated with a realistically varying glottal area may substantially differ from the resonances of the vocal tract with a closed glottis. In the present study, the deviation between closed-glottis resonances and LPC-estimated formants during phonation with different peak glottal areas has been systematically examined both using physical vocal tract models excited with a self-oscillating rubber model of the vocal folds, and by computer simulations of interacting source and filter models. Ten vocal tract resonators representing different vowels have been analyzed. The results showed that F1 increased with the peak area of the time-varying glottis, while F2 and F3 were not systematically affected. The effect of the peak glottal area on F1 was strongest for close-mid to close vowels, and more moderate for mid to open vowels.

RevDate: 2019-08-02

Patel RR, Lulich SM, A Verdi (2019)

Vocal tract shape and acoustic adjustments of children during phonation into narrow flow-resistant tubes.

The Journal of the Acoustical Society of America, 146(1):352.

The goal of the study is to quantify the salient vocal tract acoustic, subglottal acoustic, and vocal tract physiological characteristics during phonation into a narrow flow-resistant tube with 2.53 mm inner diameter and 124 mm length in typically developing vocally healthy children using simultaneous microphone, accelerometer, and 3D/4D ultrasound recordings. Acoustic measurements included fundamental frequency (fo), first formant frequency (F1), second formant frequency (F2), first subglottal resonance (FSg1), and peak-to-peak amplitude ratio (Pvt:Psg). Physiological measurements included posterior tongue height (D1), tongue dorsum height (D2), tongue tip height (D3), tongue length (D4), oral cavity width (D5), hyoid elevation (D6), pharynx width (D7). All measurements were made on eight boys and ten girls (6-9 years) during sustained /o:/ production at typical pitch and loudness, with and without flow-resistant tube. Phonation with the flow-resistant tube resulted in a significant decrease in F1, F2, and Pvt:Psg and a significant increase in D2, D3, and FSg1. A statistically significant gender effect was observed for D1, with D1 higher in boys. These findings agree well with reported findings from adults, suggesting common acoustic and articulatory mechanisms for narrow flow-resistant tube phonation. Theoretical implications of the findings are discussed.

RevDate: 2019-08-05

Wadamori N (2019)

Evaluation of a photoacoustic bone-conduction vibration system.

The Review of scientific instruments, 90(7):074905.

This article proposes a bone conduction vibrator that is based on a phenomenon by which audible sound can be perceived when vibrations are produced using a laser beam that is synchronized to the sound and these vibrations are then transmitted to an auricular cartilage. To study this phenomenon, we measured the vibrations using a rubber sheet with similar properties to those of soft tissue in combination with an acceleration sensor. We also calculated the force level of the sound based on the mechanical impedance and the acceleration in the proposed system. We estimated the formant frequencies of specific vibrations that were synchronized to five Japanese vowels using this phenomenon. We found that the vibrations produced in the rubber sheet caused audible sound generation when the photoacoustic bone conduction vibration system was used. It is expected that a force level that is equal to the reference equivalent threshold force level can be achieved at light intensities that lie below the safety limit for human skin exposure by selecting an irradiation wavelength at which a high degree of optical absorption occurs. It is demonstrated that clear sounds can be transmitted to the cochlea using the proposed system, while the effects of acoustic and electric noise in the environment are barred. Improvements in the vibratory force levels realized using this system will enable the development of a novel hearing aid that will provide an alternative to conventional bone conduction hearing aids.

RevDate: 2019-07-26

Kaneko M, Sugiyama Y, Mukudai S, et al (2019)

Effect of Voice Therapy Using Semioccluded Vocal Tract Exercises in Singers and Nonsingers With Dysphonia.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30210-3 [Epub ahead of print].

OBJECTIVES: Voice therapy with semioccluded vocal tract exercises (SOVTE) has a long history of use in singers and nonsingers with dysphonia. SOVTE with increased vocal tract impedance leads to increased vocal efficiency and economy. Although there is a growing body of research on the physiological impact of SOVTE, and growing clinical sentiment about its therapeutic benefits, empirical data describing its potential efficacy in singers and nonsingers are lacking. The objective of the current study is to evaluate vocal tract function and voice quality in singers and nonsingers with dysphonia after undergoing SOVTE.

METHODS: Patients who were diagnosed with functional dysphonia, vocal fold nodules and age-related atrophy were assessed (n = 8 singers, n = 8 nonsingers). Stroboscopic examination, aerodynamic assessment, acoustic analysis, formant frequency, and self-assessments were evaluated before and after performing SOVTE.

RESULTS: In the singer group, expiratory lung pressure, jitter, shimmer, and self-assessment significantly improved after SOVTE. In addition, formant frequency (first, second, third, and fourth), and the standard deviation (SD) of the first, second, and third formant frequency significantly improved. In the nonsinger group, expiratory lung pressure, jitter, shimmer, and Voice Handicap Index-10 significantly improved after SOVTE. However, no significant changes were observed in formant frequency.

CONCLUSIONS: These results suggest that SOVTE may improve voice quality in singers and nonsingers with dysphonia, and SOVTE may be more effective at adjusting the vocal tract function in singers with dysphonia compared to nonsingers.

RevDate: 2019-07-23

Myers S (2019)

An Acoustic Study of Sandhi Vowel Hiatus in Luganda.

Language and speech [Epub ahead of print].

In Luganda (Bantu, Uganda), a sequence of vowels in successive syllables (V.V) is not allowed. If the first vowel is high, the two vowels are joined together in a diphthong (e.g., i + a → i͜a). If the first vowel is non-high, it is deleted with compensatory lengthening of the second vowel in the sequence (e.g., e + a → aː). This paper presents an acoustic investigation of inter-word V#V sequences in Luganda. It was found that the vowel interval in V#V sequences is longer than that in V#C sequences. When the first vowel in V#V is non-high, the formant frequency of the outcome is determined by the second vowel in the sequence. When the first vowel is high, on the other hand, the sequence is realized as a diphthong, with the transition between the two formant patterns taking up most of the duration. The durational patterns within these diphthongs provide evidence against the transcription-based claim that these sequences are reorganized so that the length lies in the second vowel (/i#V/ → [jVː]). The findings bring into question a canonical case of compensatory lengthening conditioned by glide formation.

RevDate: 2019-07-15

Longo L, Di Stadio A, Ralli M, et al (2019)

Voice Parameter Changes in Professional Musician-Singers Singing with and without an Instrument: The Effect of Body Posture.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000501202 [Epub ahead of print].

BACKGROUND AND AIM: The impact of body posture on vocal emission is well known. Postural changes may increase muscular resistance in tracts of the phono-articulatory apparatus and lead to voice disorders. This work aimed to assess whether and to which extent body posture during singing and playing a musical instrument impacts voice performance in professional musicians.

SUBJECTS AND METHODS: Voice signals were recorded from 17 professional musicians (pianists and guitarists) while they were singing and while they were singing and playing a musical instrument simultaneously. Metrics were extracted from their voice spectrogram using the Multi-Dimensional Voice Program (MDVP) and included jitter, shift in fundamental voice frequency (sF0), shimmer, change in peak amplitude, noise to harmonic ratio, Voice Turbulence Index, Soft Phonation Index (SPI), Frequency Tremor Intensity Index, Amplitude Tremor Intensity Index, and maximum phonatory time (MPT). Statistical analysis was performed using two-tailed t tests, one-way ANOVA, and χ2 tests. Subjects' body posture was visually assessed following the recommendations of the Italian Society of Audiology and Phoniatrics. Thirty-seven voice signals were collected, 17 during singing and 20 during singing and playing a musical instrument.

RESULTS: Data showed that playing an instrument while singing led to an impairment of the "singer formant" and to a decrease in jitter, sF0, shimmer, SPI, and MPT. However, statistical analysis showed that none of the MDVP metrics changed significantly when subjects played an instrument compared to when they did not. Shoulder and back position affected voice features as measured by the MDVP metrics, while head and neck position did not. In particular, playing the guitar decreased the amplitude of the "singer formant" and increased noise, causing a typical "raucous rock voice."

CONCLUSIONS: Voice features may be affected by the use of the instrument the musicians play while they sing. Body posture selected by the musician while playing the instrument may affect expiration and phonation.

RevDate: 2019-07-15

Whitfield JA, DD Mehta (2019)

Examination of Clear Speech in Parkinson Disease Using Measures of Working Vowel Space.

Journal of speech, language, and hearing research : JSLHR, 62(7):2082-2098.

Purpose The purpose of the current study was to characterize clear speech production for speakers with and without Parkinson disease (PD) using several measures of working vowel space computed from frequently sampled formant trajectories. Method The 1st 2 formant frequencies were tracked for a reading passage that was produced using habitual and clear speaking styles by 15 speakers with PD and 15 healthy control speakers. Vowel space metrics were calculated from the distribution of frequently sampled formant frequency tracks, including vowel space hull area, articulatory-acoustic vowel space, and multiple vowel space density (VSD) measures based on different percentile contours of the formant density distribution. Results Both speaker groups exhibited significant increases in the articulatory-acoustic vowel space and VSD10, the area of the outermost (10th percentile) contour of the formant density distribution, from habitual to clear styles. These clarity-related vowel space increases were significantly smaller for speakers with PD than controls. Both groups also exhibited a significant increase in vowel space hull area; however, this metric was not sensitive to differences in the clear speech response between groups. Relative to healthy controls, speakers with PD exhibited a significantly smaller VSD90, the area of the most central (90th percentile), densely populated region of the formant space. Conclusions Using vowel space metrics calculated from formant traces of the reading passage, the current work suggests that speakers with PD do indeed reach the more peripheral regions of the vowel space during connected speech but spend a larger percentage of the time in more central regions of formant space than healthy speakers. Additionally, working vowel space metrics based on the distribution of formant data suggested that speakers with PD exhibited less of a clarity-related increase in formant space than controls, a trend that was not observed for perimeter-based measures of vowel space area.

RevDate: 2019-07-15

Chiu YF, Forrest K, T Loux (2019)

Relationship Between F2 Slope and Intelligibility in Parkinson's Disease: Lexical Effects and Listening Environment.

American journal of speech-language pathology, 28(2S):887-894.

Purpose There is a complex relationship between speech production and intelligibility of speech. The current study sought to evaluate the interaction of the factors of lexical characteristics, listening environment, and the 2nd formant transition (F2 slope) on intelligibility of speakers with Parkinson's disease (PD). Method Twelve speakers with PD and 12 healthy controls read sentences that included words with the diphthongs /aɪ/, /ɔɪ/, and /aʊ/. The F2 slope of the diphthong transition was measured and averaged across the 3 diphthongs for each speaker. Young adult listeners transcribed the sentences to assess intelligibility of words with high and low word frequency and high and low neighborhood density in quiet and noisy listening conditions. The average F2 slope and intelligibility scores were entered into regression models to examine their relationship. Results F2 slope was positively related to intelligibility in speakers with PD in both listening conditions with a stronger relationship in noise than in quiet. There was no significant relationship between F2 slope and intelligibility of healthy speakers. In the quiet condition, F2 slope was only correlated with intelligibility in less-frequent words produced by the PD group. In the noise condition, F2 slope was related to intelligibility in high- and low-frequency words and high-density words in PD. Conclusions The relationship between F2 slope and intelligibility in PD was affected by lexical factors and listening conditions. F2 slope was more strongly related to intelligibility in noise than in quiet for speakers with PD. This relationship was absent in highly frequent words presented in quiet and those with fewer lexical neighbors.

RevDate: 2019-07-15

Bauerly KR, Jones RM, C Miller (2019)

Effects of Social Stress on Autonomic, Behavioral, and Acoustic Parameters in Adults Who Stutter.

Journal of speech, language, and hearing research : JSLHR, 62(7):2185-2202.

Purpose The purpose of this study was to assess changes in autonomic, behavioral, and acoustic measures in response to social stress in adults who stutter (AWS) compared to adults who do not stutter (ANS). Method Participants completed the State-Trait Anxiety Inventory (Speilberger, Gorsuch, Luschene, Vagg, & Jacobs, 1983). In order to provoke social stress, participants were required to complete a modified version of the Trier Social Stress Test (TSST-M, Kirschbaum, Pirke, & Hellhammer, 1993), which included completing a nonword reading task and then preparing and delivering a speech to what was perceived as a group of professionals trained in public speaking. Autonomic nervous system changes were assessed by measuring skin conductance levels, heart rate, and respiratory sinus arrhythmia (RSA). Behavioral changes during speech production were measured in errors, percentage of syllable stuttered, percentage of other disfluencies, and speaking rate. Acoustic changes were measured using 2nd formant frequency fluctuations. In order to make comparisons of speech with and without social-cognitive stress, measurements were collected while participants completed a speaking task before and during TSST-M conditions. Results AWS showed significantly higher levels of self-reported state and trait anxiety compared to ANS. Autonomic nervous system changes revealed similar skin conductance level and heart rate across pre-TSST-M and TSST-M conditions; however, RSA levels were significantly higher in AWS compared to ANS across conditions. There were no differences found between groups for speaking rate, fundamental frequency, and percentage of other disfluencies when speaking with or without social stress. However, acoustic analysis revealed higher levels of 2nd formant frequency fluctuations in the AWS compared to the controls under pre-TSST-M conditions, followed by a decline to a level that resembled controls when speaking under the TSST-M condition. Discussion Results suggest that AWS, compared to ANS, engage higher levels of parasympathetic control (i.e., RSA) during speaking, regardless of stress level. Higher levels of self-reported state and trait anxiety support this view point and suggest that anxiety may have an indirect role on articulatory variability in AWS.

RevDate: 2019-06-30

Charles S, SM Lulich (2019)

Articulatory-acoustic relations in the production of alveolar and palatal lateral sounds in Brazilian Portuguese.

The Journal of the Acoustical Society of America, 145(6):3269.

Lateral approximant speech sounds are notoriously difficult to measure and describe due to their complex articulation and acoustics. This has prevented researchers from reaching a unifying description of the articulatory and acoustic characteristics of laterals. This paper examines articulatory and acoustic properties of Brazilian Portuguese alveolar and palatal lateral approximants (/l/ and /ʎ/) produced by six native speakers. The methodology for obtaining vocal tract area functions was based on three-dimensional/four-dimensional (3D/4D) ultrasound recordings and 3D digitized palatal impressions with simultaneously recorded audio signals. Area functions were used to calculate transfer function spectra, and predicted formant and anti-resonance frequencies were compared with the acoustic recordings. Mean absolute error in formant frequency prediction was 4% with a Pearson correlation of r = 0.987. Findings suggest anti-resonances from the interdental channels are less important than a prominent anti-resonance from the supralingual cavity but can become important in asymmetrical articulations. The use of 3D/4D ultrasound to study articulatory-acoustic relations is promising, but significant limitations remain and future work is needed to make better use of 3D/4D ultrasound data, e.g., by combining it with magnetic resonance imaging.

RevDate: 2019-07-15

Horáček J, Radolf V, AM Laukkanen (2019)

Experimental and Computational Modeling of the Effects of Voice Therapy Using Tubes.

Journal of speech, language, and hearing research : JSLHR, 62(7):2227-2244.

Purpose Phonations into a tube with the distal end either in the air or submerged in water are used for voice therapy. This study explores the effective mechanisms of these therapy methods. Method The study applied a physical model complemented by calculations from a computational model, and the results were compared to those that have been reported for humans. The effects of tube phonation on vocal tract resonances and oral pressure variation were studied. The relationships of transglottic pressure variation in time Ptrans (t) versus glottal area variation in time GA(t) were constructed. Results The physical model revealed that, for the phonation on [u:] vowel through a glass resonance tube ending in the air, the 1st formant frequency (F1) decreased by 67%, from 315 Hz to 105 Hz, thus slightly above the fundamental frequency (F0) that was set to 90-94 Hz . For phonation through the tube into water, F1 decreased by 91%-92%, reaching 26-28 Hz, and the water bubbling frequency Fb ≅ 19-24 Hz was just below F1 . The relationships of Ptrans (t) versus GA(t) clearly differentiate vowel phonation from both therapy methods, and show a physical background for voice therapy with tubes. It is shown that comparable results have been measured in humans during tube therapy. For the tube in air, F1 descends closer to F0 , whereas for the tube in water, the frequency Fb occurs close to the acoustic-mechanical resonance of the human vocal tract. Conclusion In both therapy methods, part of the airflow energy required for phonation is substituted by the acoustic energy utilizing the 1st acoustic resonance. Thus, less flow energy is needed for vocal fold vibration, which results in improved vocal efficiency. The effect can be stronger in water resistance therapy if the frequency Fb approaches the acoustic-mechanical resonance of the vocal tract, while simultaneously F0 is voluntarily changed close to F1.

RevDate: 2019-06-27

Suresh CH, Krishnan A, X Luo (2019)

Human Frequency Following Responses to Vocoded Speech: Amplitude Modulation Versus Amplitude Plus Frequency Modulation.

Ear and hearing [Epub ahead of print].

OBJECTIVES: The most commonly employed speech processing strategies in cochlear implants (CIs) only extract and encode amplitude modulation (AM) in a limited number of frequency channels. Zeng et al. (2005) proposed a novel speech processing strategy that encodes both frequency modulation (FM) and AM to improve CI performance. Using behavioral tests, they reported better speech, speaker, and tone recognition with this novel strategy than with the AM-alone strategy. Here, we used the scalp-recorded human frequency following responses (FFRs) to examine the differences in the neural representation of vocoded speech sounds with AM alone and AM + FM as the spectral and temporal cues were varied. Specifically, we were interested in determining whether the addition of FM to AM improved the neural representation of envelope periodicity (FFRENV) and temporal fine structure (FFRTFS), as reflected in the temporal pattern of the phase-locked neural activity generating the FFR.

DESIGN: FFRs were recorded from 13 normal-hearing, adult listeners in response to the original unprocessed stimulus (a synthetic diphthong /au/ with a 110-Hz fundamental frequency or F0 and a 250-msec duration) and the 2-, 4-, 8- and 16-channel sine vocoded versions of /au/ with AM alone and AM + FM. Temporal waveforms, autocorrelation analyses, fast Fourier Transform, and stimulus-response spectral correlations were used to analyze both the strength and fidelity of the neural representation of envelope periodicity (F0) and TFS (formant structure).

RESULTS: The periodicity strength in the FFRENV decreased more for the AM stimuli than for the relatively resilient AM + FM stimuli as the number of channels was increased. Regardless of the number of channels, a clear spectral peak of FFRENV was consistently observed at the stimulus F0 for all the AM + FM stimuli but not for the AM stimuli. Neural representation as revealed by the spectral correlation of FFRTFS was better for the AM + FM stimuli when compared to the AM stimuli. Neural representation of the time-varying formant-related harmonics as revealed by the spectral correlation was also better for the AM + FM stimuli as compared to the AM stimuli.

CONCLUSIONS: These results are consistent with previously reported behavioral results and suggest that the AM + FM processing strategy elicited brainstem neural activity that better preserved periodicity, temporal fine structure, and time-varying spectral information than the AM processing strategy. The relatively more robust neural representation of AM + FM stimuli observed here likely contributes to the superior performance on speech, speaker, and tone recognition with the AM + FM processing strategy. Taken together, these results suggest that neural information preserved in the FFR may be used to evaluate signal processing strategies considered for CIs.

RevDate: 2019-07-09

Stansbury AL, VM Janik (2019)

Formant Modification through Vocal Production Learning in Gray Seals.

Current biology : CB, 29(13):2244-2249.e4.

Vocal production learning is a rare communication skill and has only been found in selected avian and mammalian species [1-4]. Although humans use learned formants and voiceless sounds to encode most lexical information [5], evidence for vocal learning in other animals tends to focus on the modulation pattern of the fundamental frequency [3, 4]. Attempts to teach mammals to produce human speech sounds have largely been unsuccessful, most notably in extensive studies on great apes [5]. The limited evidence for formant copying in mammals raises the question whether advanced learned control over formant production is uniquely human. We show that gray seals (Halichoerus grypus) have the ability to match modulations in peak frequency patterns of call sequences or melodies by modifying the formants in their own calls, moving outside of their normal repertoire's distribution of frequencies and even copying human vowel sounds. Seals also demonstrated enhanced auditory memory for call sequences by accurately copying sequential changes in peak frequency and the number of calls played to them. Our results demonstrate that formants can be influenced by vocal production learning in non-human vocal learners, providing a mammalian substrate for the evolution of flexible information coding in formants as found in human language.

RevDate: 2019-06-16

Dahl KL, LA Mahler (2019)

Acoustic Features of Transfeminine Voices and Perceptions of Voice Femininity.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30075-X [Epub ahead of print].

The purpose of this study was to evaluate the relationships between acoustic measures of transfeminine voices and both self- and listener ratings of voice femininity. Connected speech samples were collected from 12 transfeminine individuals (M = 36.3 years, SD = 10.6 years) and a control group of five cisgender (cis) women and five cis men (M = 35.3 years, SD = 13.3 years). The acoustic measures of fundamental frequency (fo), fo variation, formant frequencies, and vocal intensity were calculated from these samples. Transfeminine speakers rated their own voices on a five-point scale of voice femininity. Twenty inexperienced listeners heard an excerpt of each speech sample and rated the voices on the same five-point scale of voice femininity. Spearman's rank-order correlation coefficients were calculated to measure the relationships between the acoustic variables and ratings of voice femininity. Significant positive correlations were found between fo and both self-ratings (r = 0.712, P = 0.009) and listener ratings of voice femininity (r = 0.513, P < 0.001). Significant positive correlations were found between intensity and both self-ratings (r = 0.584, P = 0.046) and listener ratings of voice femininity (r = 0.584, P = 0.046). No significant correlations were found between fo variation or formant frequencies and perceptual ratings of voice femininity. A Pearson's chi-square test of independence showed that the distribution of self- and listener ratings differed significantly (χ2 = 9.668, P = 0.046). Self- and listener ratings were also shown to be strongly correlated (r = 0.912, P < 0.001). This study provides further evidence to support the selection of training targets in voice feminization programs for transfeminine individuals and promotes the use of self-ratings of voice as an important outcome measure.

RevDate: 2019-06-13

Sankar MSA, PS Sathidevi (2019)

A scalable speech coding scheme using compressive sensing and orthogonal mapping based vector quantization.

Heliyon, 5(5):e01820 pii:e01820.

A novel scalable speech coding scheme based on Compressive Sensing (CS), which can operate at bit rates from 3.275 to 7.275 kbps is designed and implemented in this paper. The CS based speech coding offers the benefit of combined compression and encryption with inherent de-noising and bit rate scalability. The non-stationary nature of speech signal causes the recovery process from CS measurements very complex due to the variation in sparsifying bases. In this work, the complexity of the recovery process is reduced by assigning a suitable basis to each frame of the speech signal based on its statistical properties. As the quality of the reconstructed speech depends on the sensing matrix used at the transmitter, a variant of Binary Permuted Block Diagonal (BPBD) matrix is also proposed here which offers a better performance than that of the commonly used Gaussian random matrix. To improve the coding efficiency, formant filter coefficients are quantized using the conventional Vector Quantization (VQ) and an orthogonal mapping based VQ is developed for the quantization of CS measurements. The proposed coding scheme offers the listening quality for reconstructed speech similar to that of Adaptive Multi rate - Narrowband (AMR-NB) codec at 6.7 kbps and Enhanced Voice Services (EVS) at 7.2 kbps. A separate de-noising block is not required in the proposed coding scheme due to the inherent de-noising property of CS. Scalability in bit rate is achieved in the proposed method by varying the number of random measurements and the number of levels for orthogonal mapping in the VQ stage of measurements.

RevDate: 2019-08-28
CmpDate: 2019-08-28

de Carvalho CC, da Silva DM, de Carvalho Junior AD, et al (2019)

Pre-operative voice evaluation as a hypothetical predictor of difficult laryngoscopy.

Anaesthesia, 74(9):1147-1152.

We examined the potential for voice sounds to predict a difficult airway as compared with prediction by the modified Mallampati test. A total of 453 patients scheduled for elective surgery under general anaesthesia with tracheal intubation were studied. Five phonemes were recorded and their formants analysed. Difficult laryngoscopy was defined as the Cormack-Lehane grade 3 or 4. Univariate and multivariate logistic regression were used to examine the association between some variables (mouth opening, sternomental distance, modified Mallampati and formants) and difficult laryngoscopy. Difficult laryngoscopy was reported in 29/453 (6.4%) patients. Among five regression models evaluated, the model achieving better performance to predict difficult laryngoscopy, after a variable selection criteria (stepwise, multivariate) and included a modified Mallampati classification (OR 2.920; 95%CI 1.992-4.279; p < 0.001), first formant of /i/(iF1) (OR 1.003; 95%CI 1.002-1.04; p < 0.001), and second formant of /i/(iF2) (OR 0.998; 95%CI 0.997-0.998; p < 0.001). The receiver operating curve for a regression model that included both formants and Mallampati showed an area under curve of 0.918, higher than formants alone (area under curve 0.761) and modified Mallampati alone (area under curve 0.874). Voice presented a significant association with difficult laryngoscopy during general anaesthesia showing a 76.1% probability of correctly classifying a randomly selected patient.

RevDate: 2019-07-31

Easwar V, Scollie S, D Purcell (2019)

Investigating potential interactions between envelope following responses elicited simultaneously by different vowel formants.

Hearing research, 380:35-45.

Envelope following responses (EFRs) evoked by the periodicity of voicing in vowels are elicited at the fundamental frequency of voice (f0), irrespective of the harmonics that initiate it. One approach of improving the frequency specificity of vowel stimuli without increasing test-time is by altering the f0 selectively in one or more formants. The harmonics contributing to an EFR can then be differentiated by the unique f0 at which the EFRs are elicited. The advantages of using such an approach would be increased frequency specificity and efficiency, given that multiple EFRs can be evaluated in a certain test-time. However, multiple EFRs elicited simultaneously could interact and lead to altered amplitudes and outcomes. To this end, the present study aimed to evaluate: (i) if simultaneous recording of two EFRs, one elicited by harmonics in the first formant (F1) and one elicited by harmonics in the second and higher formants (F2+), leads to attenuation or enhancement of EFR amplitude, and (ii) if simultaneous measurement of two EFRs affects its accuracy and anticipated efficiency. In a group of 22 young adults with normal hearing, EFRs were elicited by F1 and F2+ bands of /u/, /a/ and /i/ when F1 and F2+ were presented independently (individual), when F1 and F2+ were presented simultaneously (dual), and when F1 or F2+ was presented with spectrally matched Gaussian noise of the other (noise). Repeated-measures analysis of variance indicated no significant group differences in EFR amplitudes between any of the conditions, suggesting minimal between-EFR interactions. Between-participant variability was evident, however, significant changes were evident only in a third of the participants for the stimulus /u/ F1. For the majority of stimuli, the change between individual and dual conditions was positively correlated with the change between individual and noise conditions, suggesting that interaction-based changes in EFR amplitude, when present, were likely due to the restriction of cochlear regions of excitation in the presence of a competing stimulus. The amplitude of residual noise was significantly higher in the dual or noise relative to the individual conditions, although the mean differences were very small (<3 nV). F-test-based detection of EFRs, commonly used to determine the presence of an EFR, did not vary across conditions. Further, neither the mean reduction in EFR amplitude nor the mean increase in noise amplitude in dual relative to individual conditions was large enough to alter the anticipated gain in efficiency of simultaneous EFR recordings. Together, results suggest that the approach of simultaneously recording two vowel-evoked EFRs from different formants for improved frequency-specificity does not alter test accuracy and is more time-efficient than evaluating EFRs to each formant individually.

RevDate: 2019-06-02

Buckley DP, Dahl KL, Cler GJ, et al (2019)

Transmasculine Voice Modification: A Case Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30116-X [Epub ahead of print].

This case study measured the effects of manual laryngeal therapy on the fundamental frequency (fo), formant frequencies, estimated vocal tract length, and listener perception of masculinity of a 32-year-old transmasculine individual. The participant began testosterone therapy 1.5 years prior to the study. Two therapy approaches were administered sequentially in a single session: (1) passive circumlaryngeal massage and manual laryngeal reposturing, and (2) active laryngeal reposturing with voicing. Acoustic recordings were collected before and after each treatment and 3 days after the session. Speaking fo decreased from 124 Hz to 120 Hz after passive training, and to 108 Hz after active training. Estimated vocal tract length increased from 17.0 cm to 17.3 cm after passive training, and to 19.4 cm after active training. Eight listeners evaluated the masculinity of the participant's speech; his voice was rated as most masculine at the end of the training session. All measures returned to baseline at follow-up. Overall, both acoustic and perceptual changes were observed in one transmasculine individual who participated in manual laryngeal therapy, even after significant testosterone-induced voice changes had already occurred; however, changes were not maintained in the follow-up. This study adds to scant literature on effective approaches to and proposed outcome measures for voice masculinization in transmasculine individuals.

RevDate: 2019-06-13

Chen WR, Whalen DH, CH Shadle (2019)

F0-induced formant measurement errors result in biased variabilities.

The Journal of the Acoustical Society of America, 145(5):EL360.

Many developmental studies attribute reduction of acoustic variability to increasing motor control. However, linear prediction-based formant measurements are known to be biased toward the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant variability generated by changes in F0 is unknown. Here, 470 000 vowels were synthesized, mimicking statistics reported in four developmental studies, to estimate the proportion of formant variability that can be attributed to F0 bias, as well as other formant measurement errors. Results showed that the F0-induced formant measurements errors are large and systematic, and cannot be eliminated by a large sample size.

RevDate: 2019-06-02

Briefer EF, Vizier E, Gygax L, et al (2019)

Expression of emotional valence in pig closed-mouth grunts: Involvement of both source- and filter-related parameters.

The Journal of the Acoustical Society of America, 145(5):2895.

Emotion expression plays a crucial role for regulating social interactions. One efficient channel for emotion communication is the vocal-auditory channel, which enables a fast transmission of information. Filter-related parameters (formants) have been suggested as a key to the vocal differentiation of emotional valence (positive versus negative) across species, but variation in relation to emotions has rarely been investigated. Here, whether pig (Sus scrofa domesticus) closed-mouth grunts differ in source- and filter-related features when produced in situations assumed to be positive and negative is investigated. Behavioral and physiological parameters were used to validate the animals' emotional state (both in terms of valence and arousal, i.e., bodily activation). Results revealed that grunts produced in a positive situation were characterized by higher formants, a narrower range of the third formant, a shorter duration, a lower fundamental frequency, and a lower harmonicity compared to negative grunts. Particularly, formant-related parameters and duration made up most of the difference between positive and negative grunts. Therefore, these parameters have the potential to encode dynamic information and to vary as a function of the emotional valence of the emitter in pigs, and possibly in other mammals as well.

RevDate: 2019-06-10

Houde JF, Gill JS, Agnew Z, et al (2019)

Abnormally increased vocal responses to pitch feedback perturbations in patients with cerebellar degeneration.

The Journal of the Acoustical Society of America, 145(5):EL372.

Cerebellar degeneration (CD) has deleterious effects on speech motor behavior. Recently, a dissociation between feedback and feedforward control of speaking was observed in CD: Whereas CD patients exhibited reduced adaptation across trials to consistent formant feedback alterations, they showed enhanced within-trial compensation for unpredictable formant feedback perturbations. In this study, it was found that CD patients exhibit abnormally increased within-trial vocal compensation responses to unpredictable pitch feedback perturbations. Taken together with recent findings, the results indicate that CD is associated with a general hypersensitivity to auditory feedback during speaking.

RevDate: 2019-05-31

Reinheimer DM, Andrade BMR, Nascimento JKF, et al (2019)

Formant Frequencies, Cephalometric Measures, and Pharyngeal Airway Width in Adults With Congenital, Isolated, and Untreated Growth Hormone Deficiency.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30061-X [Epub ahead of print].

OBJECTIVE: Adult subjects with isolated growth hormone deficiency (IGHD) due to a mutation in the growth hormone releasing hormone receptor gene exhibit higher values formant frequencies. In normal subjects, a significant negative association between the formant frequencies and the reduction of linear craniofacial measurements, especially of maxilla and mandible, has been reported. This suggests smaller pharyngeal width, despite low prevalence of obstructive sleep apnea syndrome. Here we evaluate their pharyngeal airway width, its correlation with vowel formant frequencies, and the correlation between them and the craniofacial measures.

SUBJECTS AND METHODS: A two-step protocol was performed. In the first case-control experiment, aimed to assess the pharyngeal width, we compared nine adult IGHD and 36 normal statured controls. Both upper and lower pharyngeal widths were measured. The second step (assessment of pharyngeal width) was performed only in the IGHD group.

RESULTS: Upper and lower pharyngeal widths were similar in IGHD and controls. In IGHD subjects, the lower pharyngeal width exhibited a negative correlation with F1 [a] and a positive correlation with mandibular length. There were negative correlations between F1 and F2 with linear and positive correlations with the angular measures.

CONCLUSIONS: Pharyngeal airway width is not reduced in adults with congenital, untreated lifetime IGHD, contributing to the low prevalence of obstructive sleep apnea syndrome. The formant frequencies relate more with cephalometric measurements than with the pharyngeal airway width. These findings exemplify the consequences of lifetime IGHD on osseous and nonosseous growth.

RevDate: 2019-05-28

Easwar V, Scollie S, Aiken S, et al (2019)

Test-Retest Variability in the Characteristics of Envelope Following Responses Evoked by Speech Stimuli.

Ear and hearing [Epub ahead of print].

OBJECTIVES: The objective of the present study was to evaluate the between-session test-retest variability in the characteristics of envelope following responses (EFRs) evoked by modified natural speech stimuli in young normal hearing adults.

DESIGN: EFRs from 22 adults were recorded in two sessions, 1 to 12 days apart. EFRs were evoked by the token /susa∫ i/ (2.05 sec) presented at 65 dB SPL and recorded from the vertex referenced to the neck. The token /susa∫ i/, spoken by a male with an average fundamental frequency [f0] of 98.53 Hz, was of interest because of its potential utility as an objective hearing aid outcome measure. Each vowel was modified to elicit two EFRs simultaneously by lowering the f0 in the first formant while maintaining the original f0 in the higher formants. Fricatives were amplitude-modulated at 93.02 Hz and elicited one EFR each. EFRs evoked by vowels and fricatives were estimated using Fourier analyzer and discrete Fourier transform, respectively. Detection of EFRs was determined by an F-test. Test-retest variability in EFR amplitude and phase coherence were quantified using correlation, repeated-measures analysis of variance, and the repeatability coefficient. The repeatability coefficient, computed as twice the standard deviation (SD) of test-retest differences, represents the ±95% limits of test-retest variation around the mean difference. Test-retest variability of EFR amplitude and phase coherence were compared using the coefficient of variation, a normalized metric, which represents the ratio of the SD of repeat measurements to its mean. Consistency in EFR detection outcomes was assessed using the test of proportions.

RESULTS: EFR amplitude and phase coherence did not vary significantly between sessions, and were significantly correlated across repeat measurements. The repeatability coefficient for EFR amplitude ranged from 38.5 nV to 45.6 nV for all stimuli, except for /∫/ (71.6 nV). For any given stimulus, the test-retest differences in EFR amplitude of individual participants were not correlated with their test-retest differences in noise amplitude. However, across stimuli, higher repeatability coefficients of EFR amplitude tended to occur when the group mean noise amplitude and the repeatability coefficient of noise amplitude were higher. The test-retest variability of phase coherence was comparable to that of EFR amplitude in terms of the coefficient of variation, and the repeatability coefficient varied from 0.1 to 0.2, with the highest value of 0.2 for /∫/. Mismatches in EFR detection outcomes occurred in 11 of 176 measurements. For each stimulus, the tests of proportions revealed a significantly higher proportion of matched detection outcomes compared to mismatches.

CONCLUSIONS: Speech-evoked EFRs demonstrated reasonable repeatability across sessions. Of the eight stimuli, the shortest stimulus /∫/ demonstrated the largest variability in EFR amplitude and phase coherence. The test-retest variability in EFR amplitude could not be explained by test-retest differences in noise amplitude for any of the stimuli. This lack of explanation argues for other sources of variability, one possibility being the modulation of cortical contributions imposed on brainstem-generated EFRs.

RevDate: 2019-08-27

Zhao TC, Masapollo M, Polka L, et al (2019)

Effects of formant proximity and stimulus prototypicality on the neural discrimination of vowels: Evidence from the auditory frequency-following response.

Brain and language, 194:77-83.

Cross-language speech perception experiments indicate that for many vowel contrasts, discrimination is easier when the same pair of vowels is presented in one direction compared to the reverse direction. According to one account, these directional asymmetries reflect a universal bias favoring "focal" vowels (i.e., vowels with prominent spectral peaks formed by the convergence of adjacent formants). An alternative account is that such effects reflect an experience-dependent bias favoring prototypical exemplars of native-language vowel categories. Here, we tested the predictions of these accounts by recording the auditory frequency-following response in English-speaking listeners to two synthetic variants of the vowel /u/ that differed in the proximity of their first and second formants and prototypicality, with stimuli arranged in oddball and reversed-oddball blocks. Participants showed evidence of neural discrimination when the more-focal/less-prototypic /u/ served as the deviant stimulus, but not when the less-focal/more-prototypic /u/ served as the deviant, consistent with the focalization account.

RevDate: 2019-06-22

König A, Linz N, Zeghari R, et al (2019)

Detecting Apathy in Older Adults with Cognitive Disorders Using Automatic Speech Analysis.

Journal of Alzheimer's disease : JAD, 69(4):1183-1193.

BACKGROUND: Apathy is present in several psychiatric and neurological conditions and has been found to have a severe negative effect on disease progression. In older people, it can be a predictor of increased dementia risk. Current assessment methods lack objectivity and sensitivity, thus new diagnostic tools and broad-scale screening technologies are needed.

OBJECTIVE: This study is the first of its kind aiming to investigate whether automatic speech analysis could be used for characterization and detection of apathy.

METHODS: A group of apathetic and non-apathetic patients (n = 60) with mild to moderate neurocognitive disorder were recorded while performing two short narrative speech tasks. Paralinguistic markers relating to prosodic, formant, source, and temporal qualities of speech were automatically extracted, examined between the groups and compared to baseline assessments. Machine learning experiments were carried out to validate the diagnostic power of extracted markers.

RESULTS: Correlations between apathy sub-scales and features revealed a relation between temporal aspects of speech and the subdomains of reduction in interest and initiative, as well as between prosody features and the affective domain. Group differences were found to vary for males and females, depending on the task. Differences in temporal aspects of speech were found to be the most consistent difference between apathetic and non-apathetic patients. Machine learning models trained on speech features achieved top performances of AUC = 0.88 for males and AUC = 0.77 for females.

CONCLUSIONS: These findings reinforce the usability of speech as a reliable biomarker in the detection and assessment of apathy.

RevDate: 2019-05-23

Cox SR, Raphael LJ, PC Doyle (2019)

Production of Vowels by Electrolaryngeal Speakers Using Clear Speech.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000499928 [Epub ahead of print].

BACKGROUND/AIMS: This study examined the effect of clear speech on vowel productions by electrolaryngeal speakers.

METHOD: Ten electrolaryngeal speakers produced eighteen words containing /i/, /ɪ/, /ɛ/, /æ/, /eɪ/, and /oʊ/ using habitual speech and clear speech. Twelve listeners transcribed 360 words, and a total of 4,320 vowel stimuli across speaking conditions, speakers, and listeners were analyzed. Analyses included listeners' identifications of vowels, vowel duration, and vowel formant relationships.

RESULTS: No significant effect of speaking condition was found on vowel identification. Specifically, 85.4% of the vowels were identified in habitual speech, and 82.7% of the vowels were identified in clear speech. However, clear speech was found to have a significant effect on vowel durations. The mean vowel duration in the 17 consonant-vowel-consonant words was 333 ms in habitual speech and 354 ms in clear speech. The mean vowel duration in the single consonant-vowel words was 551 ms in habitual speech and 629 ms in clear speech.

CONCLUSION: Finding suggests that, although clear speech facilitates longer vowel durations, electrolaryngeal speakers may not gain a clear speech benefit relative to listeners' vowel identifications.

RevDate: 2019-05-22

Alharbi GG, Cannito MP, Buder EH, et al (2019)

Spectral/Cepstral Analyses of Phonation in Parkinson's Disease before and after Voice Treatment: A Preliminary Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000495837 [Epub ahead of print].

PURPOSE: This article examines cepstral/spectral analyses of sustained /α/ vowels produced by speakers with hypokinetic dysarthria secondary to idiopathic Parkinson's disease (PD) before and after Lee Silverman Voice Treatment (LSVT®LOUD) and the relationship of these measures with overall voice intensity.

METHODOLOGY: Nine speakers with PD were examined in a pre-/post-treatment design, with multiple daily audio recordings before and after treatment. Sustained vowels were analyzed for cepstral peak prominence (CPP), CPP standard deviation (CPP SD), low/high spectral ratio (L/H SR), and Cepstral/Spectral Index of Dysphonia (CSID) using the KAYPENTAX computer software.

RESULTS: CPP and CPP SD increased significantly and CSID decreased significantly from pre- to post-treatment recordings, with strong effect sizes. Increased CPP indicates increased dominance of harmonics in the spectrum following LSVT. After restricting the frequency cutoff to the region just above the first formant and second formant and below the third formant, L/H SR was observed to decrease significantly following treatment. Correlation analyses demonstrated that CPP was more strongly associated with CSID before treatment than after.

CONCLUSION: In addition to increased vocal intensity following LSVT, speakers with PD exhibited both improved harmonic structure and voice quality as reflected by cepstral/spectral analysis, indicating that there was improved harmonic structure and reduced dysphonia following treatment.

RevDate: 2019-05-20

Zaltz Y, Goldsworthy RL, Eisenberg LS, et al (2019)

Children With Normal Hearing Are Efficient Users of Fundamental Frequency and Vocal Tract Length Cues for Voice Discrimination.

Ear and hearing [Epub ahead of print].

BACKGROUND: The ability to discriminate between talkers assists listeners in understanding speech in a multitalker environment. This ability has been shown to be influenced by sensory processing of vocal acoustic cues, such as fundamental frequency (F0) and formant frequencies that reflect the listener's vocal tract length (VTL), and by cognitive processes, such as attention and memory. It is, therefore, suggested that children who exhibit immature sensory and/or cognitive processing will demonstrate poor voice discrimination (VD) compared with young adults. Moreover, greater difficulties in VD may be associated with spectral degradation as in children with cochlear implants.

OBJECTIVES: The aim of this study was as follows: (1) to assess the use of F0 cues, VTL cues, and the combination of both cues for VD in normal-hearing (NH) school-age children and to compare their performance with that of NH adults; (2) to assess the influence of spectral degradation by means of vocoded speech on the use of F0 and VTL cues for VD in NH children; and (3) to assess the contribution of attention, working memory, and nonverbal reasoning to performance.

DESIGN: Forty-one children, 8 to 11 years of age, were tested with nonvocoded stimuli. Twenty-one of them were also tested with eight-channel, noise-vocoded stimuli. Twenty-one young adults (18 to 35 years) were tested for comparison. A three-interval, three-alternative forced-choice paradigm with an adaptive tracking procedure was used to estimate the difference limens (DLs) for VD when F0, VTL, and F0 + VTL were manipulated separately. Auditory memory, visual attention, and nonverbal reasoning were assessed for all participants.

RESULTS: (a) Children' F0 and VTL discrimination abilities were comparable to those of adults, suggesting that most school-age children utilize both cues effectively for VD. (b) Children's VD was associated with trail making test scores that assessed visual attention abilities and speed of processing, possibly reflecting their need to recruit cognitive resources for the task. (c) Best DLs were achieved for the combined (F0 + VTL) manipulation for both children and adults, suggesting that children at this age are already capable of integrating spectral and temporal cues. (d) Both children and adults found the VTL manipulations more beneficial for VD compared with the F0 manipulations, suggesting that formant frequencies are more reliable for identifying a specific speaker than F0. (e) Poorer DLs were achieved with the vocoded stimuli, though the children maintained similar thresholds and pattern of performance among manipulations as the adults.

CONCLUSIONS: The present study is the first to assess the contribution of F0, VTL, and the combined F0 + VTL to the discrimination of speakers in school-age children. The findings support the notion that many NH school-age children have effective spectral and temporal coding mechanisms that allow sufficient VD, even in the presence of spectrally degraded information. These results may challenge the notion that immature sensory processing underlies poor listening abilities in children, further implying that other processing mechanisms contribute to their difficulties to understand speech in a multitalker environment. These outcomes may also provide insight into VD processes of children under listening conditions that are similar to cochlear implant users.

RevDate: 2019-06-10

Auracher J, Scharinger M, W Menninghaus (2019)

Contiguity-based sound iconicity: The meaning of words resonates with phonetic properties of their immediate verbal contexts.

PloS one, 14(5):e0216930 pii:PONE-D-18-29313.

We tested the hypothesis that phonosemantic iconicity--i.e., a motivated resonance of sound and meaning--might not only be found on the level of individual words or entire texts, but also in word combinations such that the meaning of a target word is iconically expressed, or highlighted, in the phonetic properties of its immediate verbal context. To this end, we extracted single lines from German poems that all include a word designating high or low dominance, such as large or small, strong or weak, etc. Based on insights from previous studies, we expected to find more vowels with a relatively short distance between the first two formants (low formant dispersion) in the immediate context of words expressing high physical or social dominance than in the context of words expressing low dominance. Our findings support this hypothesis, suggesting that neighboring words can form iconic dyads in which the meaning of one word is sound-iconically reflected in the phonetic properties of adjacent words. The construct of a contiguity-based phono-semantic iconicity opens many venues for future research well beyond lines extracted from poems.

RevDate: 2019-05-24

Koenig LL, S Fuchs (2019)

Vowel Formants in Normal and Loud Speech.

Journal of speech, language, and hearing research : JSLHR, 62(5):1278-1295.

Purpose This study evaluated how 1st and 2nd vowel formant frequencies (F1, F2) differ between normal and loud speech in multiple speaking tasks to assess claims that loudness leads to exaggerated vowel articulation. Method Eleven healthy German-speaking women produced normal and loud speech in 3 tasks that varied in the degree of spontaneity: reading sentences that contained isolated /i: a: u:/, responding to questions that included target words with controlled consonantal contexts but varying vowel qualities, and a recipe recall task. Loudness variation was elicited naturalistically by changing interlocutor distance. First and 2nd formant frequencies and average sound pressure level were obtained from the stressed vowels in the target words, and vowel space area was calculated from /i: a: u:/. Results Comparisons across many vowels indicated that high, tense vowels showed limited formant variation as a function of loudness. Analysis of /i: a: u:/ across speech tasks revealed vowel space reduction in the recipe retell task compared to the other 2. Loudness changes for F1 were consistent in direction but variable in extent, with few significant results for high tense vowels. Results for F2 were quite varied and frequently not significant. Speakers differed in how loudness and task affected formant values. Finally, correlations between sound pressure level and F1 were generally positive but varied in magnitude across vowels, with the high tense vowels showing very flat slopes. Discussion These data indicate that naturalistically elicited loud speech in typical speakers does not always lead to changes in vowel formant frequencies and call into question the notion that increasing loudness is necessarily an automatic method of expanding the vowel space. Supplemental Material https://doi.org/10.23641/asha.8061740.

RevDate: 2019-05-24

Nalborczyk L, Batailler C, Lœvenbruck H, et al (2019)

An Introduction to Bayesian Multilevel Models Using brms: A Case Study of Gender Effects on Vowel Variability in Standard Indonesian.

Journal of speech, language, and hearing research : JSLHR, 62(5):1225-1242.

Purpose Bayesian multilevel models are increasingly used to overcome the limitations of frequentist approaches in the analysis of complex structured data. This tutorial introduces Bayesian multilevel modeling for the specific analysis of speech data, using the brms package developed in R. Method In this tutorial, we provide a practical introduction to Bayesian multilevel modeling by reanalyzing a phonetic data set containing formant (F1 and F2) values for 5 vowels of standard Indonesian (ISO 639-3:ind), as spoken by 8 speakers (4 females and 4 males), with several repetitions of each vowel. Results We first give an introductory overview of the Bayesian framework and multilevel modeling. We then show how Bayesian multilevel models can be fitted using the probabilistic programming language Stan and the R package brms, which provides an intuitive formula syntax. Conclusions Through this tutorial, we demonstrate some of the advantages of the Bayesian framework for statistical modeling and provide a detailed case study, with complete source code for full reproducibility of the analyses (https://osf.io/dpzcb /). Supplemental Material https://doi.org/10.23641/asha.7973822.

RevDate: 2019-07-20

van Rij J, Hendriks P, van Rijn H, et al (2019)

Analyzing the Time Course of Pupillometric Data.

Trends in hearing, 23:2331216519832483.

This article provides a tutorial for analyzing pupillometric data. Pupil dilation has become increasingly popular in psychological and psycholinguistic research as a measure to trace language processing. However, there is no general consensus about procedures to analyze the data, with most studies analyzing extracted features from the pupil dilation data instead of analyzing the pupil dilation trajectories directly. Recent studies have started to apply nonlinear regression and other methods to analyze the pupil dilation trajectories directly, utilizing all available information in the continuously measured signal. This article applies a nonlinear regression analysis, generalized additive mixed modeling, and illustrates how to analyze the full-time course of the pupil dilation signal. The regression analysis is particularly suited for analyzing pupil dilation in the fields of psychological and psycholinguistic research because generalized additive mixed models can include complex nonlinear interactions for investigating the effects of properties of stimuli (e.g., formant frequency) or participants (e.g., working memory score) on the pupil dilation signal. To account for the variation due to participants and items, nonlinear random effects can be included. However, one of the challenges for analyzing time series data is dealing with the autocorrelation in the residuals, which is rather extreme for the pupillary signal. On the basis of simulations, we explain potential causes of this extreme autocorrelation, and on the basis of the experimental data, we show how to reduce their adverse effects, allowing a much more coherent interpretation of pupillary data than possible with feature-based techniques.

RevDate: 2019-05-09

He L, Zhang Y, V Dellwo (2019)

Between-speaker variability and temporal organization of the first formant.

The Journal of the Acoustical Society of America, 145(3):EL209.

First formant (F1) trajectories of vocalic intervals were divided into positive and negative dynamics. Positive F1 dynamics were defined as the speeds of F1 increases to reach the maxima, and negative F1 dynamics as the speeds of F1 decreases away from the maxima. Mean, standard deviation, and sequential variability were measured for both dynamics. Results showed that measures of negative F1 dynamics explained more between-speaker variability, which was highly congruent with a previous study using intensity dynamics [He and Dellwo (2017). J. Acoust. Soc. Am. 141, EL488-EL494]. The results may be explained by speaker idiosyncratic articulation.

RevDate: 2019-05-09

Roberts B, RJ Summers (2019)

Dichotic integration of acoustic-phonetic information: Competition from extraneous formants increases the effect of second-formant attenuation on intelligibility.

The Journal of the Acoustical Society of America, 145(3):1230.

Differences in ear of presentation and level do not prevent effective integration of concurrent speech cues such as formant frequencies. For example, presenting the higher formants of a consonant-vowel syllable in the opposite ear to the first formant protects them from upward spread of masking, allowing them to remain effective speech cues even after substantial attenuation. This study used three-formant (F1+F2+F3) analogues of natural sentences and extended the approach to include competitive conditions. Target formants were presented dichotically (F1+F3; F2), either alone or accompanied by an extraneous competitor for F2 (i.e., F1±F2C+F3; F2) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour and using the F2 amplitude contour without attenuation. In experiment 1, F2C was always absent and intelligibility was unaffected until F2 attenuation exceeded 30 dB; F2 still provided useful information at 48-dB attenuation. In experiment 2, attenuating F2 by 24 dB caused considerable loss of intelligibility when F2C was present, but had no effect in its absence. Factors likely to contribute to this interaction include informational masking from F2C acting to swamp the acoustic-phonetic information carried by F2, and interaural inhibition from F2C acting to reduce the effective level of F2.

RevDate: 2019-05-03

Ogata K, Kodama T, Hayakawa T, et al (2019)

Inverse estimation of the vocal tract shape based on a vocal tract mapping interface.

The Journal of the Acoustical Society of America, 145(4):1961.

This paper describes the inverse estimation of the vocal tract shape for vowels by using a vocal tract mapping interface. In prior research, an interface capable of generating a vocal tract shape by clicking on its window was developed. The vocal tract shapes for five vowels are located at the vertices of a pentagonal chart and a different shape that corresponds to an arbitrary mouse-pointer position on the interface window is calculated by interpolation. In this study, an attempt was made to apply the interface to the inverse estimation of vocal tract shapes based on formant frequencies. A target formant frequency data set was searched based on the geometry of the interface window by using a coarse to fine algorithm. It was revealed that the estimated vocal tract shapes obtained from the mapping interface were close to those from magnetic resonance imaging data in another study and to lip area data captured using video recordings. The results of experiments to evaluate the estimated vocal tract shapes showed that each subject demonstrated unique trajectories on the interface window corresponding to the estimated vocal tract shapes. These results suggest the usefulness of inverse estimation using the interface.

RevDate: 2019-05-03

Thompson A, Y Kim (2019)

Relation of second formant trajectories to tongue kinematics.

The Journal of the Acoustical Society of America, 145(4):EL323.

In this study, the relationship between the acoustic and articulatory kinematic domains of speech was examined among nine neurologically healthy female speakers using two derived relationships between tongue kinematics and F2 measurements: (1) second formant frequency (F2) extent to lingual displacement and (2) F2 slope to lingual speed. Additionally, the relationships between these paired parameters were examined within conversational, more clear, and less clear speaking modes. In general, the findings of the study support a strong correlation for both sets of paired parameters. In addition, the data showed significant changes in articulatory behaviors across speaking modes including the magnitude of tongue motion, but not in the speed-related measures.

RevDate: 2019-05-03

Bürki A, Welby P, Clément M, et al (2019)

Orthography and second language word learning: Moving beyond "friend or foe?".

The Journal of the Acoustical Society of America, 145(4):EL265.

French participants learned English pseudowords either with the orthographic form displayed under the corresponding picture (Audio-Ortho) or without (Audio). In a naming task, pseudowords learned in the Audio-Ortho condition were produced faster and with fewer errors, providing a first piece of evidence that orthographic information facilitates the learning and on-line retrieval of productive vocabulary in a second language. Formant analyses, however, showed that productions from the Audio-Ortho condition were more French-like (i.e., less target-like), a result confirmed by a vowel categorization task performed by native speakers of English. It is argued that novel word learning and pronunciation accuracy should be considered together.

RevDate: 2019-04-26

Colby S, Shiller DM, Clayards M, et al (2019)

Different Responses to Altered Auditory Feedback in Younger and Older Adults Reflect Differences in Lexical Bias.

Journal of speech, language, and hearing research : JSLHR, 62(4S):1144-1151.

Purpose Previous work has found that both young and older adults exhibit a lexical bias in categorizing speech stimuli. In young adults, this has been argued to be an automatic influence of the lexicon on perceptual category boundaries. Older adults exhibit more top-down biases than younger adults, including an increased lexical bias. We investigated the nature of the increased lexical bias using a sensorimotor adaptation task designed to evaluate whether automatic processes drive this bias in older adults. Method A group of older adults (n = 27) and younger adults (n = 35) participated in an altered auditory feedback production task. Participants produced target words and nonwords under altered feedback that affected the 1st formant of the vowel. There were 2 feedback conditions that affected the lexical status of the target, such that target words were shifted to sound more like nonwords (e.g., less-liss) and target nonwords to sound more like words (e.g., kess-kiss). Results A mixed-effects linear regression was used to investigate the magnitude of compensation to altered auditory feedback between age groups and lexical conditions. Over the course of the experiment, older adults compensated (by shifting their production of 1st formant) more to altered auditory feedback when producing words that were shifted toward nonwords (less-liss) than when producing nonwords that were shifted toward words (kess-kiss). This is in contrast to younger adults who compensated more to nonwords that were shifted toward words compared to words that were shifted toward nonwords. Conclusion We found no evidence that the increased lexical bias previously observed in older adults is driven by a greater sensitivity to top-down lexical influence on perceptual category boundaries. We suggest the increased lexical bias in older adults is driven by postperceptual processes that arise as a result of age-related cognitive and sensory changes.

RevDate: 2019-04-24

Schertz J, Carbonell K, AJ Lotto (2019)

Language Specificity in Phonetic Cue Weighting: Monolingual and Bilingual Perception of the Stop Voicing Contrast in English and Spanish.

Phonetica pii:000497278 [Epub ahead of print].

BACKGROUND/AIMS: This work examines the perception of the stop voicing contrast in Spanish and English along four acoustic dimensions, comparing monolingual and bilingual listeners. Our primary goals are to test the extent to which cue-weighting strategies are language-specific in monolinguals, and whether this language specificity extends to bilingual listeners.

METHODS: Participants categorized sounds varying in voice onset time (VOT, the primary cue to the contrast) and three secondary cues: fundamental frequency at vowel onset, first formant (F1) onset frequency, and stop closure duration. Listeners heard acoustically identical target stimuli, within language-specific carrier phrases, in English and Spanish modes.

RESULTS: While all listener groups used all cues, monolingual English listeners relied more on F1, and less on closure duration, than monolingual Spanish listeners, indicating language specificity in cue use. Early bilingual listeners used the three secondary cues similarly in English and Spanish, despite showing language-specific VOT boundaries.

CONCLUSION: While our findings reinforce previous work demonstrating language-specific phonetic representations in bilinguals in terms of VOT boundary, they suggest that this specificity may not extend straightforwardly to cue-weighting strategies.

RevDate: 2019-04-24

Kulikov V (2019)

Laryngeal Contrast in Qatari Arabic: Effect of Speaking Rate on Voice Onset Time.

Phonetica pii:000497277 [Epub ahead of print].

Beckman and colleagues claimed in 2011 that Swedish has an overspecified phonological contrast between prevoiced and voiceless aspirated stops. Yet, Swedish is the only language for which this pattern has been reported. The current study describes a similar phonological pattern in the vernacular Arabic dialect of Qatar. Acoustic measurements of main (voice onset time, VOT) and secondary (fundamental frequency, first formant) cues to voicing are based on production data of 8 native speakers of Qatari Arabic, who pronounced 1,380 voiced and voiceless word-initial stops in the slow and fast rate conditions. The results suggest that the VOT pattern found in voiced Qatari Arabic stops b, d, g is consistent with prevoicing in voice languages like Dutch, Russian, or Swedish. The pattern found in voiceless stops t, k is consistent with aspiration in aspirating languages like English, German, or Swedish. Similar to Swedish, both prevoicing and aspiration in Qatari Arabic stops change in response to speaking rate. VOT significantly increased by 19 ms in prevoiced stops and by 12 ms in voiceless stops in the slow speaking rate condition. The findings suggest that phonological overspecification in laryngeal contrasts may not be an uncommon pattern among languages.

RevDate: 2019-04-18

Hedrick M, Thornton KET, Yeager K, et al (2019)

The Use of Static and Dynamic Cues for Vowel Identification by Children Wearing Hearing Aids or Cochlear Implants.

Ear and hearing [Epub ahead of print].

OBJECTIVE: To examine vowel perception based on dynamic formant transition and/or static formant pattern cues in children with hearing loss while using their hearing aids or cochlear implants. We predicted that the sensorineural hearing loss would degrade formant transitions more than static formant patterns, and that shortening the duration of cues would cause more difficulty for vowel identification for these children than for their normal-hearing peers.

DESIGN: A repeated-measures, between-group design was used. Children 4 to 9 years of age from a university hearing services clinic who were fit for hearing aids (13 children) or who wore cochlear implants (10 children) participated. Chronologically age-matched children with normal hearing served as controls (23 children). Stimuli included three naturally produced syllables (/bα/, /bi/, and /bu/), which were presented either in their entirety or segmented to isolate the formant transition or the vowel static formant center. The stimuli were presented to listeners via loudspeaker in the sound field. Aided participants wore their own devices and listened with their everyday settings. Participants chose the vowel presented by selecting from corresponding pictures on a computer screen.

RESULTS: Children with hearing loss were less able to use shortened transition or shortened vowel centers to identify vowels as compared to their normal-hearing peers. Whole syllable and initial transition yielded better identification performance than the vowel center for /α/, but not for /i/ or /u/.

CONCLUSIONS: The children with hearing loss may require a longer time window than children with normal hearing to integrate vowel cues over time because of altered peripheral encoding in spectrotemporal domains. Clinical implications include cognizance of the importance of vowel perception when developing habilitative programs for children with hearing loss.

RevDate: 2019-04-17

Lowenstein JH, S Nittrouer (2019)

Perception-Production Links in Children's Speech.

Journal of speech, language, and hearing research : JSLHR, 62(4):853-867.

Purpose Child phonologists have long been interested in how tightly speech input constrains the speech production capacities of young children, and the question acquires clinical significance when children with hearing loss are considered. Children with sensorineural hearing loss often show differences in the spectral and temporal structures of their speech production, compared to children with normal hearing. The current study was designed to investigate the extent to which this problem can be explained by signal degradation. Method Ten 5-year-olds with normal hearing were recorded imitating 120 three-syllable nonwords presented in unprocessed form and as noise-vocoded signals. Target segments consisted of fricatives, stops, and vowels. Several measures were made: 2 duration measures (voice onset time and fricative length) and 4 spectral measures involving 2 segments (1st and 3rd moments of fricatives and 1st and 2nd formant frequencies for the point vowels). Results All spectral measures were affected by signal degradation, with vowel production showing the largest effects. Although a change in voice onset time was observed with vocoded signals for /d/, voicing category was not affected. Fricative duration remained constant. Conclusions Results support the hypothesis that quality of the input signal constrains the speech production capacities of young children. Consequently, it can be concluded that the production problems of children with hearing loss-including those with cochlear implants-can be explained to some extent by the degradation in the signal they hear. However, experience with both speech perception and production likely plays a role as well.

RevDate: 2019-04-05

Croake DJ, Andreatta RD, JC Stemple (2019)

Descriptive Analysis of the Interactive Patterning of the Vocalization Subsystems in Healthy Participants: A Dynamic Systems Perspective.

Journal of speech, language, and hearing research : JSLHR, 62(2):215-228.

Purpose Normative data for many objective voice measures are routinely used in clinical voice assessment; however, normative data reflect vocal output, but not vocalization process. The underlying physiologic processes of healthy phonation have been shown to be nonlinear and thus are likely different across individuals. Dynamic systems theory postulates that performance behaviors emerge from the nonlinear interplay of multiple physiologic components and that certain patterns are preferred and loosely governed by the interactions of physiology, task, and environment. The purpose of this study was to descriptively characterize the interactive nature of the vocalization subsystem triad in subjects with healthy voices and to determine if differing subgroups could be delineated to better understand how healthy voicing is physiologically generated. Method Respiratory kinematic, aerodynamic, and acoustic formant data were obtained from 29 individuals with healthy voices (21 female and eight male). Multivariate analyses were used to descriptively characterize the interactions among the subsystems that contributed to healthy voicing. Results Group data revealed representative measures of the 3 subsystems to be generally within the boundaries of established normative data. Despite this, 3 distinct clusters were delineated that represented 3 subgroups of individuals with differing subsystem patterning. Seven of the 9 measured variables in this study were found to be significantly different across at least 1 of the 3 subgroups indicating differing physiologic processes across individuals. Conclusion Vocal output in healthy individuals appears to be generated by distinct and preferred physiologic processes that were represented by 3 subgroups indicating that the process of vocalization is different among individuals, but not entirely idiosyncratic. Possibilities for these differences are explored using the framework of dynamic systems theory and the dynamics of emergent behaviors. A revised physiologic model of phonation that accounts for differences within and among the vocalization subsystems is described. Supplemental Material https://doi.org/10.23641/asha.7616462.

RevDate: 2019-08-02

Stilp CE, AA Assgari (2019)

Natural speech statistics shift phoneme categorization.

Attention, perception & psychophysics, 81(6):2037-2052.

All perception takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., more energy at lower first formant [F1] frequencies) differs from that of a later (target) sound (e.g., vowel with intermediate F1), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F1 /ɛ/). Historically, these studies used filters to force context sounds to possess desired spectral compositions. This approach is agnostic to the natural signal statistics of speech (inherent spectral compositions without any additional manipulations). The auditory system is thought to be attuned to such stimulus statistics, but this has gone untested. Here, vowel categorization was measured following unfiltered (already possessing the desired spectral composition) or filtered sentences (to match spectral characteristics of unfiltered sentences). Vowel categorization was biased in both cases, with larger biases as the spectral prominences in context sentences increased. This confirms sensitivity to natural signal statistics, extending spectral context effects in speech perception to more naturalistic listening conditions. Importantly, categorization biases were smaller and more variable following unfiltered sentences, raising important questions about how faithfully experiments using filtered contexts model everyday speech perception.

RevDate: 2019-04-02

Rodrigues S, Martins F, Silva S, et al (2019)

/l/ velarisation as a continuum.

PloS one, 14(3):e0213392 pii:PONE-D-18-30510.

In this paper, we present a production study to explore the controversial question about /l/ velarisation. Measurements of first (F1), second (F2) and third (F3) formant frequencies and the slope of F2 were analysed to clarify the /l/ velarisation behaviour in European Portuguese (EP). The acoustic data were collected from ten EP speakers, producing trisyllabic words with paroxytone stress pattern, with the liquid consonant at the middle of the word in onset, complex onset and coda positions. Results suggested that /l/ is produced on a continuum in EP. The consistently low F2 indicates that /l/ is velarised in all syllable positions, but variation especially in F1 and F3 revealed that /l/ could be "more velarised" or "less velarised" dependent on syllable positions and vowel contexts. These findings suggest that it is important to consider different acoustic measures to better understand /l/ velarisation in EP.

RevDate: 2019-03-08

Rampinini AC, Handjaras G, Leo A, et al (2019)

Formant Space Reconstruction From Brain Activity in Frontal and Temporal Regions Coding for Heard Vowels.

Frontiers in human neuroscience, 13:32.

Classical studies have isolated a distributed network of temporal and frontal areas engaged in the neural representation of speech perception and production. With modern literature arguing against unique roles for these cortical regions, different theories have favored either neural code-sharing or cortical space-sharing, thus trying to explain the intertwined spatial and functional organization of motor and acoustic components across the fronto-temporal cortical network. In this context, the focus of attention has recently shifted toward specific model fitting, aimed at motor and/or acoustic space reconstruction in brain activity within the language network. Here, we tested a model based on acoustic properties (formants), and one based on motor properties (articulation parameters), where model-free decoding of evoked fMRI activity during perception, imagery, and production of vowels had been successful. Results revealed that phonological information organizes around formant structure during the perception of vowels; interestingly, such a model was reconstructed in a broad temporal region, outside of the primary auditory cortex, but also in the pars triangularis of the left inferior frontal gyrus. Conversely, articulatory features were not associated with brain activity in these regions. Overall, our results call for a degree of interdependence based on acoustic information, between the frontal and temporal ends of the language network.

RevDate: 2019-08-31

Franken MK, Acheson DJ, McQueen JM, et al (2019)

Consistency influences altered auditory feedback processing.

Quarterly journal of experimental psychology (2006), 72(10):2371-2379.

Previous research on the effect of perturbed auditory feedback in speech production has focused on two types of responses. In the short term, speakers generate compensatory motor commands in response to unexpected perturbations. In the longer term, speakers adapt feedforward motor programmes in response to feedback perturbations, to avoid future errors. The current study investigated the relation between these two types of responses to altered auditory feedback. Specifically, it was hypothesised that consistency in previous feedback perturbations would influence whether speakers adapt their feedforward motor programmes. In an altered auditory feedback paradigm, formant perturbations were applied either across all trials (the consistent condition) or only to some trials, whereas the others remained unperturbed (the inconsistent condition). The results showed that speakers' responses were affected by feedback consistency, with stronger speech changes in the consistent condition compared with the inconsistent condition. Current models of speech-motor control can explain this consistency effect. However, the data also suggest that compensation and adaptation are distinct processes, which are not in line with all current models.

RevDate: 2019-03-05

Klaus A, Lametti DR, Shiller DM, et al (2019)

Can perceptual training alter the effect of visual biofeedback in speech-motor learning?.

The Journal of the Acoustical Society of America, 145(2):805.

Recent work showing that a period of perceptual training can modulate the magnitude of speech-motor learning in a perturbed auditory feedback task could inform clinical interventions or second-language training strategies. The present study investigated the influence of perceptual training on a clinically and pedagogically relevant task of vocally matching a visually presented speech target using visual-acoustic biofeedback. Forty female adults aged 18-35 yr received perceptual training targeting the English /æ-ɛ/ contrast, randomly assigned to a condition that shifted the perceptual boundary toward either /æ/ or /ɛ/. Participants were then asked to produce the word head while modifying their output to match a visually presented acoustic target corresponding with a slightly higher first formant (F1, closer to /æ/). By analogy to findings from previous research, it was predicted that individuals whose boundary was shifted toward /æ/ would also show a greater magnitude of change in the visual biofeedback task. After perceptual training, the groups showed the predicted difference in perceptual boundary location, but they did not differ in their performance on the biofeedback matching task. It is proposed that the explicit versus implicit nature of the tasks used might account for the difference between this study and previous findings.

RevDate: 2019-03-02

Dissen Y, Goldberger J, J Keshet (2019)

Formant estimation and tracking: A deep learning approach.

The Journal of the Acoustical Society of America, 145(2):642.

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

RevDate: 2019-03-02

Kirkham S, Nance C, Littlewood B, et al (2019)

Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English.

The Journal of the Acoustical Society of America, 145(2):784.

This study analyses the time-varying acoustics of laterals and their adjacent vowels in Manchester and Liverpool English. Generalized additive mixed-models (GAMMs) are used for quantifying time-varying formant data, which allows the modelling of non-linearities in acoustic time series while simultaneously modelling speaker and word level variability in the data. These models are compared to single time-point analyses of lateral and vowel targets in order to determine what analysing formant dynamics can tell about dialect variation in speech acoustics. The results show that lateral targets exhibit robust differences between some positional contexts and also between dialects, with smaller differences present in vowel targets. The time-varying analysis shows that dialect differences frequently occur globally across the lateral and adjacent vowels. These results suggest a complex relationship between lateral and vowel targets and their coarticulatory dynamics, which problematizes straightforward claims about the realization of laterals and their adjacent vowels. These findings are further discussed in terms of hypotheses about positional and sociophonetic variation. In doing so, the utility of GAMMs for analysing time-varying multi-segmental acoustic signals is demonstrated, and the significance of the results for accounts of English lateral typology is highlighted.

RevDate: 2019-02-19

Menda G, Nitzany EI, Shamble PS, et al (2019)

The Long and Short of Hearing in the Mosquito Aedes aegypti.

Current biology : CB, 29(4):709-714.e4.

Mating behavior in Aedes aegypti mosquitoes occurs mid-air and involves the exchange of auditory signals at close range (millimeters to centimeters) [1-6]. It is widely assumed that this intimate signaling distance reflects short-range auditory sensitivity of their antennal hearing organs to faint flight tones [7, 8]. To the contrary, we show here that male mosquitoes can hear the female's flight tone at surprisingly long distances-from several meters to up to 10 m-and that unrestrained, resting Ae. aegypti males leap off their perches and take flight when they hear female flight tones. Moreover, auditory sensitivity tests of Ae. aegypti's hearing organ, made from neurophysiological recordings of the auditory nerve in response to pure-tone stimuli played from a loudspeaker, support the behavioral experiments. This demonstration of long-range hearing in mosquitoes overturns the common assumption that the thread-like antennal hearing organs of tiny insects are strictly close-range ears. The effective range of a hearing organ depends ultimately on its sensitivity [9-13]. Here, a mosquito's antennal ear is shown to be sensitive to sound levels down to 31 dB sound pressure level (SPL), translating to air particle velocity at nanometer dimensions. We note that the peak of energy of the first formant of the vowels of the human speech spectrum range from about 200-1,000 Hz and is typically spoken at 45-70 dB SPL; together, they lie in the sweet spot of mosquito hearing. VIDEO ABSTRACT.

RevDate: 2019-03-29

Garellek M (2019)

Acoustic Discriminability of the Complex Phonation System in !Xóõ.

Phonetica pii:000494301 [Epub ahead of print].

Phonation types, or contrastive voice qualities, are minimally produced using complex movements of the vocal folds, but may additionally involve constriction in the supraglottal and pharyngeal cavities. These complex articulations in turn produce a multidimensional acoustic output that can be modeled in various ways. In this study, I investigate whether the psychoacoustic model of voice by Kreiman et al. (2014) succeeds at distinguishing six phonation types of !Xóõ. Linear discriminant analysis is performed using parameters from the model averaged over the entire vowel as well as for the first and final halves of the vowel. The results indicate very high classification accuracy for all phonation types. Measures averaged over the vowel's entire duration are closely correlated with the discriminant functions, suggesting that they are sufficient for distinguishing even dynamic phonation types. Measures from all classes of parameters are correlated with the linear discriminant functions; in particular, the "strident" vowels, which are harsh in quality, are characterized by their noise, changes in spectral tilt, decrease in voicing amplitude and frequency, and raising of the first formant. Despite the large number of contrasts and the time-varying characteristics of many of the phonation types, the phonation contrasts in !Xóõ remain well differentiated acoustically.

RevDate: 2019-02-10

Apaydın E, İkincioğulları A, Çolak M, et al (2019)

The Voice Performance After Septoplasty With Surgical Efficacy Demonstrated Through Acoustic Rhinometry and Rhinomanometry.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30531-9 [Epub ahead of print].

OBJECTIVE: To demonstrate the surgical efficacy of septoplasty using acoustic rhinometry (AR) and anterior rhinomanometry (ARM) and to evaluate the effect of septoplasty on voice performance through subjective voice analysis methods.

MATERIALS AND METHODS: This prospective study enrolled a total of 62 patients who underwent septoplasty with the diagnosis of deviated nasal septum. Thirteen patients with no postoperative improvement versus preoperative period as shown by AR and/or ARM tests and three patients with postoperative complications and four patients who were lost to follow-up were excluded. As a result, a total of 42 patients were included in the study. Objective tests including AR, ARM, acoustic voice analysis and spectrographic analysis were performed before the surgery and at 1 month and 3 months after the surgery. Subjective measures included the Nasal Obstruction Symptom Evaluation questionnaire to evaluate surgical success and Voice Handicap Index-30 tool for assessment of voice performance postoperatively, both completed by all study patients.

RESULTS: Among acoustic voice analysis parameters, F0, jitter, Harmonics-to-Noise Ratio values as well as formant frequency (F1-F2-F3-F4) values did not show significant differences postoperatively in comparison to the preoperative period (P > 0.05). Only the shimmer value was statistically significantly reduced at 1 month (P < 0.05) and 3 months postoperatively (P < 0.05) versus baseline. Statistically significant reductions in Voice Handicap Index-30 scores were observed at postoperative 1 month (P < 0.001) and 3 months (P < 0.001) compared to the preoperative period and between postoperative 1 month and 3 months (P < 0.05).

CONCLUSION: In this study, first operative success of septoplasty was demonstrated through objective tests and then objective voice analyses were performed to better evaluate the overall effect of septoplasty on voice performance. Shimmer value was found to be improved in the early and late postoperative periods.

RevDate: 2019-02-15

de Souza GVS, Duarte JMT, Viegas F, et al (2019)

An Acoustic Examination of Pitch Variation in Soprano Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30416-8 [Epub ahead of print].

INTRODUCTION: The ability to perform acoustic inspection of data and to correlate the results with perceptual and physiological aspects facilitates vocal behavior analysis. The singing voice has specific characteristics and parameters that are involved during the phonation mechanism, which may be analyzed acoustically.

OBJECTIVE: To describe and analyze the fundamental frequency and formants in pitch variation in the /a/ vowel in sopranos.

METHODS: The sample consisted of 30 female participants between the ages of 20 to 45 years without vocal complaints. All sustained vowel sounds were recorded with the /a/ vowel sustained for 5 seconds, with three replications at low (C4-261 Hz), medium (Eb4-622 Hz), and high (Bb4-932 Hz) frequencies that were comfortable for the voice classification. In total, 90 samples were analyzed with digital extraction of the fundamental frequency (f0) and the first five formants (F1, F2, F3, F4, and F5) and manual confirmation. The middle segment was considered for analysis, whereas the onset and offset segments were not considered. Subsequently, FFT (fast Fourier transform) plots, LPC (linear predictive coding) graphs, and tube diagrams were created. The Shapiro-Wilks test was applied for adherence and the Friedman test was applied for comparison of paired samples.

RESULTS: For vocalizations at low and medium pitches, higher values were observed for the first five formant frequencies than for the f0 value. Overlaying the LPC and FFT graphs revealed a similarity between F1 and F2 at the two pitches, with clustered harmonics in the F3, F4, and F5 region in the low pitch. At the medium pitch, there was similarity between F3 and F4, an F5 peak, and tuned harmonics. However, in the high-pitch vocalizations, there was an increase in the F2, F3, F4, and F5 values in relation to f0, and there was similarity between them along with synchrony between f0 and F1, H2 and F2, H3 and F3, H4 and F4, and H5 and F5.

CONCLUSIONS: Pitch changes indicate differences in the behavior of the fundamental frequency and sound formants in sopranos. The comparison of the sustained vowels sounds in f0 at the three pitches revealed specific vocal tract changes on the LPC curve and FFT harmonics, with an extra gain range at 261 Hz, synchrony between peaks of formants and harmonics at 622 Hz, and equivalence of f0 and F1 at 932 Hz.

RevDate: 2019-01-20

Galle ME, Klein-Packard J, Schreiber K, et al (2019)

What Are You Waiting For? Real-Time Integration of Cues for Fricatives Suggests Encapsulated Auditory Memory.

Cognitive science, 43(1):.

Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150-350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.

RevDate: 2019-01-15

Naderifar E, Ghorbani A, Moradi N, et al (2019)

Use of formant centralization ratio for vowel impairment detection in normal hearing and different degrees of hearing impairment.

Logopedics, phoniatrics, vocology [Epub ahead of print].

PURPOSE: Hearing-impaired (HI) speakers show changes in vowel production and formant frequencies, as well as more cases of overlapping between vowels and more restricted formant space, than hearing speakers. This study was intended to explore whether the use of different acoustic parameters (Formant Centralization Ratio (FCR), Vowel Space Area (VSA), F2i/F2u ratio (second formant of/i,u/)) was suitable or not for characterizing impairments in the articulation of vowels in the speech of HL speakers. In fact, correlated acoustic parameters are used to determine the limits of tongue movements in vowel production in different severity degrees of hearing impairment.

METHODS: Speech recordings of 40 speakers with HL and 40 healthy controls were acoustically analyzed. The vowels (/a/,/i/,/u/) were extracted from the word context and, then, the first and second formants were calculated. The same vowel-formant elements were used to construct the FCR, expressed as (F2u + F2a + F1i + F1u)/(F2i + F1a), the F2i/F2u ratio, and the vowel space area (VSA), expressed as ABS((F1i*(F2a-F2u)+F1a*(F2u-F2i)+F1u*(F2i-F2a))/2).

RESULTS: The FCR differentiated HL groups from the control group and the discrimination was not gender-sensitive. All parameters were found to be strongly correlated with each other.

CONCLUSIONS: The findings of this study showed that FCR was a more sensitive acoustic parameter than F2i/F2u ratio and VSA to distinguish speech of the HL groups from that of the normal group. Thus, FCR is considered to be applicable as an early objective measure of impaired vowel articulation in HL speakers.

RevDate: 2019-01-11

Ballard KJ, Halaki M, Sowman P, et al (2018)

An Investigation of Compensation and Adaptation to Auditory Perturbations in Individuals With Acquired Apraxia of Speech.

Frontiers in human neuroscience, 12:510.

Two auditory perturbation experiments were used to investigate the integrity of neural circuits responsible for speech sensorimotor adaptation in acquired apraxia of speech (AOS). This has implications for understanding the nature of AOS as well as normal speech motor control. Two experiments were conducted. In Experiment 1, compensatory responses to unpredictable fundamental frequency (F0) perturbations during vocalization were investigated in healthy older adults and adults with acquired AOS plus aphasia. F0 perturbation involved upward and downward 100-cent shifts versus no shift, in equal proportion, during 2 s vocalizations of the vowel /a/. In Experiment 2, adaptive responses to sustained first formant (F1) perturbations during speech were investigated in healthy older adults, adults with AOS and adults with aphasia only (APH). The F1 protocol involved production of the vowel /ε/ in four consonant-vowel words of Australian English (pear, bear, care, dare), and one control word with a different vowel (paw). An unperturbed Baseline phase was followed by a gradual Ramp to a 30% upward F1 shift stimulating a compensatory response, a Hold phase where the perturbation was repeatedly presented with alternating blocks of masking trials to probe adaptation, and an End phase with masking trials only to measure persistence of any adaptation. AOS participants showed normal compensation to unexpected F0 perturbations, indicating that auditory feedback control of low-level, non-segmental parameters is intact. Furthermore, individuals with AOS displayed an adaptive response to sustained F1 perturbations, but age-matched controls and APH participants did not. These findings suggest that older healthy adults may have less plastic motor programs that resist modification based on sensory feedback, whereas individuals with AOS have less well-established and more malleable motor programs due to damage from stroke.

RevDate: 2019-08-23

Caldwell MT, Jiradejvong P, CJ Limb (2019)

Effects of Phantom Electrode Stimulation on Vocal Production in Cochlear Implant Users.

Ear and hearing, 40(5):1127-1139.

OBJECTIVES: Cochlear implant (CI) users suffer from a range of speech impairments, such as stuttering and vocal control of pitch and intensity. Though little research has focused on the role of auditory feedback in the speech of CI users, these speech impairments could be due in part to limited access to low-frequency cues inherent in CI-mediated listening. Phantom electrode stimulation (PES) represents a novel application of current steering that extends access to low frequencies for CI recipients. It is important to note that PES transmits frequencies below 300 Hz, whereas Baseline does not. The objective of this study was to explore the effects of PES on multiple frequency-related characteristics of voice production.

DESIGN: Eight postlingually deafened, adult Advanced Bionics CI users underwent a series of vocal production tests including Tone Repetition, Vowel Sound Production, Passage Reading, and Picture Description. Participants completed all of these tests twice: once with PES and once using their program used for everyday listening (Baseline). An additional test, Automatic Modulation, was included to measure acute effects of PES and was completed only once. This test involved switching between PES and Baseline at specific time intervals in real time as participants read a series of short sentences. Finally, a subjective Vocal Effort measurement was also included.

RESULTS: In Tone Repetition, the fundamental frequencies (F0) of tones produced using PES and the size of musical intervals produced using PES were significantly more accurate (closer to the target) compared with Baseline in specific gender, target tone range, and target tone type testing conditions. In the Vowel Sound Production task, vowel formant profiles produced using PES were closer to that of the general population compared with those produced using Baseline. The Passage Reading and Picture Description task results suggest that PES reduces measures of pitch variability (F0 standard deviation and range) in natural speech production. No significant results were found in comparisons of PES and Baseline in the Automatic Modulation task nor in the Vocal Effort task.

CONCLUSIONS: The findings of this study suggest that usage of PES increases accuracy of pitch matching in repeated sung tones and frequency intervals, possibly due to more accurate F0 representation. The results also suggest that PES partially normalizes the vowel formant profiles of select vowel sounds. PES seems to decrease pitch variability of natural speech and appears to have limited acute effects on natural speech production, though this finding may be due in part to paradigm limitations. On average, subjective ratings of vocal effort were unaffected by the usage of PES versus Baseline.

RevDate: 2019-08-30

Saba JN, Ali H, JHL Hansen (2018)

Formant priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

The Journal of the Acoustical Society of America, 144(6):3371.

The Advanced Combination Encoder (ACE) signal processing strategy is used in the majority of cochlear implant (CI) sound processors manufactured by Cochlear Corporation. This "n-of-m" strategy selects "n" out of "m" available frequency channels with the highest spectral energy in each stimulation cycle. It is hypothesized that at low signal-to-noise ratio (SNR) conditions, noise-dominant frequency channels are susceptible for selection, neglecting channels containing target speech cues. In order to improve speech segregation in noise, explicit encoding of formant frequency locations within the standard channel selection framework of ACE is suggested. Two strategies using the direct formant estimation algorithms are developed within this study, FACE (formant-ACE) and VFACE (voiced-activated-formant-ACE). Speech intelligibility from eight CI users is compared across 11 acoustic conditions, including mixtures of noise and reverberation at multiple SNRs. Significant intelligibility gains were observed with VFACE over ACE in 5 dB babble noise; however, results with FACE/VFACE in all other conditions were comparable to standard ACE. An increased selection of channels associated with the second formant frequency is observed for FACE and VFACE. Both proposed methods may serve as potential supplementary channel selection techniques for the ACE sound processing strategy for cochlear implants.

RevDate: 2019-09-03

Kochetov A, Tabain M, Sreedevi N, et al (2018)

Manner and place differences in Kannada coronal consonants: Articulatory and acoustic results.

The Journal of the Acoustical Society of America, 144(6):3221.

This study investigated articulatory differences in the realization of Kannada coronal consonants of the same place but different manner of articulation. This was done by examining tongue positions and acoustic formant transitions for dentals and retroflexes of three manners of articulation: stops, nasals, and laterals. Ultrasound imaging data collected from ten speakers of the language revealed that the tongue body/root was more forward for the nasal manner of articulation compared to stop and lateral consonants of the same place of articulation. The dental nasal and lateral were also produced with a higher front part of the tongue compared to the dental stop. As a result, the place contrast was greater in magnitude for the stops (being the prototypical dental vs retroflex) than for the nasals and laterals (being apparently alveolar vs retroflex). Acoustic formant transition differences were found to reflect some of the articulatory differences, while also providing evidence for the more dynamic articulation of nasal and lateral retroflexes. Overall, the results of the study shed light on factors underlying manner requirements (aerodynamic or physiological) and how the factors interact with principles of gestural economy/symmetry, providing an empirical baseline for further cross-language investigations and articulation-to-acoustics modeling.

RevDate: 2019-01-08

Mekyska J, Galaz Z, Kiska T, et al (2018)

Quantitative Analysis of Relationship Between Hypokinetic Dysarthria and the Freezing of Gait in Parkinson's Disease.

Cognitive computation, 10(6):1006-1018.

Hypokinetic dysarthria (HD) and freezing of gait (FOG) are both axial symptoms that occur in patients with Parkinson's disease (PD). It is assumed they have some common pathophysiological mechanisms and therefore that speech disorders in PD can predict FOG deficits within the horizon of some years. The aim of this study is to employ a complex quantitative analysis of the phonation, articulation and prosody in PD patients in order to identify the relationship between HD and FOG, and establish a mathematical model that would predict FOG deficits using acoustic analysis at baseline. We enrolled 75 PD patients who were assessed by 6 clinical scales including the Freezing of Gait Questionnaire (FOG-Q). We subsequently extracted 19 acoustic measures quantifying speech disorders in the fields of phonation, articulation and prosody. To identify the relationship between HD and FOG, we performed a partial correlation analysis. Finally, based on the selected acoustic measures, we trained regression models to predict the change in FOG during a 2-year follow-up. We identified significant correlations between FOG-Q scores and the acoustic measures based on formant frequencies (quantifying the movement of the tongue and jaw) and speech rate. Using the regression models, we were able to predict a change in particular FOG-Q scores with an error of between 7.4 and 17.0 %. This study is suggesting that FOG in patients with PD is mainly linked to improper articulation, a disturbed speech rate and to intelligibility. We have also proved that the acoustic analysis of HD at the baseline can be used as a predictor of the FOG deficit during 2 years of follow-up. This knowledge enables researchers to introduce new cognitive systems that predict gait difficulties in PD patients.

RevDate: 2019-08-02
CmpDate: 2019-05-08

Masapollo M, Zhao TC, Franklin L, et al (2019)

Asymmetric discrimination of nonspeech tonal analogues of vowels.

Journal of experimental psychology. Human perception and performance, 45(2):285-300.

Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences because of the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with nonspeech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with 2-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in 1 or both of these 2 acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in nonspeech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

RevDate: 2019-07-11

Carney LH, JM McDonough (2019)

Nonlinear auditory models yield new insights into representations of vowels.

Attention, perception & psychophysics, 81(4):1034-1046.

Studies of vowel systems regularly appeal to the need to understand how the auditory system encodes and processes the information in the acoustic signal. The goal of this study is to present computational models to address this need, and to use the models to illustrate responses to vowels at two levels of the auditory pathway. Many of the models previously used to study auditory representations of speech are based on linear filter banks simulating the tuning of the inner ear. These models do not incorporate key nonlinear response properties of the inner ear that influence responses at conversational-speech sound levels. These nonlinear properties shape neural representations in ways that are important for understanding responses in the central nervous system. The model for auditory-nerve (AN) fibers used here incorporates realistic nonlinear properties associated with the basilar membrane, inner hair cells (IHCs), and the IHC-AN synapse. These nonlinearities set up profiles of f0-related fluctuations that vary in amplitude across the population of frequency-tuned AN fibers. Amplitude fluctuations in AN responses are smallest near formant peaks and largest at frequencies between formants. These f0-related fluctuations strongly excite or suppress neurons in the auditory midbrain, the first level of the auditory pathway where tuning for low-frequency fluctuations in sounds occurs. Formant-related amplitude fluctuations provide representations of the vowel spectrum in discharge rates of midbrain neurons. These representations in the midbrain are robust across a wide range of sound levels, including the entire range of conversational-speech levels, and in the presence of realistic background noise levels.

RevDate: 2019-05-07
CmpDate: 2019-05-07

Anikin A, N Johansson (2019)

Implicit associations between individual properties of color and sound.

Attention, perception & psychophysics, 81(3):764-777.

We report a series of 22 experiments in which the implicit associations test (IAT) was used to investigate cross-modal correspondences between visual (luminance, hue [R-G, B-Y], saturation) and acoustic (loudness, pitch, formants [F1, F2], spectral centroid, trill) dimensions. Colors were sampled from the perceptually accurate CIE-Lab space, and the complex, vowel-like sounds were created with a formant synthesizer capable of separately manipulating individual acoustic properties. In line with previous reports, the loudness and pitch of acoustic stimuli were associated with both luminance and saturation of the presented colors. However, pitch was associated specifically with color lightness, whereas loudness mapped onto greater visual saliency. Manipulating the spectrum of sounds without modifying their pitch showed that an upward shift of spectral energy was associated with the same visual features (higher luminance and saturation) as higher pitch. In contrast, changing formant frequencies of synthetic vowels while minimizing the accompanying shifts in spectral centroid failed to reveal cross-modal correspondences with color. This may indicate that the commonly reported associations between vowels and colors are mediated by differences in the overall balance of low- and high-frequency energy in the spectrum rather than by vowel identity as such. Surprisingly, the hue of colors with the same luminance and saturation was not associated with any of the tested acoustic features, except for a weak preference to match higher pitch with blue (vs. yellow). We discuss these findings in the context of previous research and consider their implications for sound symbolism in world languages.

RevDate: 2019-07-29
CmpDate: 2019-07-29

Paltura C, K Yelken (2019)

An Examination of Vocal Tract Acoustics following Wendler's Glottoplasty.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 71(1):24-28.

PURPOSE: To investigate the formant frequency (FF) features of transgender females' (TFs) voice after Wendler's glottoplasty surgery and compare these levels with age-matched healthy males and females.

STUDY DESIGN: Controlled prospective.

METHODS: 20 TFs and 20 genetically male and female age-matched healthy controls were enrolled in the study. The fundamental frequency (F0) and FFs F1-F4 were obtained from TF speakers 6 months after surgery. These levels were compared with those of healthy controls.

RESULTS: Statistical analysis showed that the median F0 values were similar between TFs and females. The median F1 levels of TFs were different from females but similar to males. The F2 levels of TFs were similar to females but different from males. The F3 and F4 levels were significantly different from both male and female controls.

CONCLUSION: Wendler's glottoplasty technique is an effective method to increase F0 levels among TF patients; however, these individuals report their voice does not sufficiently project femininity. The results obtained with regard to FF levels may be the reason for this problem. Voice therapy is recommended as a possible approach to assist TF patients achieve a satisfactory feminine voice.

RevDate: 2018-12-03

Hardy TLD, Rieger JM, Wells K, et al (2018)

Acoustic Predictors of Gender Attribution, Masculinity-Femininity, and Vocal Naturalness Ratings Amongst Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30355-2 [Epub ahead of print].

PURPOSE: This study aimed to identify the most salient set of acoustic predictors of (1) gender attribution; (2) perceived masculinity-femininity; and (3) perceived vocal naturalness amongst a group of transgender and cisgender speakers to inform voice and communication feminization training programs. This study used a unique set of acoustic variables and included a third, androgynous, choice for gender attribution ratings.

METHOD: Data were collected across two phases and involved two separate groups of participants: communicators and raters. In the first phase, audio recordings were captured of communicators (n = 40) during cartoon retell, sustained vowel, and carrier phrase tasks. Acoustic measures were obtained from these recordings. In the second phase, raters (n = 20) provided ratings of gender attribution, perceived masculinity-femininity, and vocal naturalness based on a sample of the cartoon description recording.

RESULTS: Results of a multinomial logistic regression analysis identified mean fundamental frequency (fo) as the sole acoustic measure that changed the odds of being attributed as a woman or ambiguous in gender rather than as a man. Multiple linear regression analyses identified mean fo, average formant frequency of /i/, and mean sound pressure level as predictors of masculinity-femininity ratings and mean fo, average formant frequency, and rate of speech as predictors of vocal naturalness ratings.

CONCLUSION: The results of this study support the continued targeting of fo and vocal tract resonance in voice and communication feminization/masculinization training programs and provide preliminary evidence for more emphasis being placed on vocal intensity and rate of speech. Modification of these voice parameters may help clients to achieve a natural-sounding voice that satisfactorily represents their affirmed gender.

RevDate: 2019-07-10
CmpDate: 2019-07-10

Fujimura S, Kojima T, Okanoue Y, et al (2019)

Discrimination of "hot potato voice" caused by upper airway obstruction utilizing a support vector machine.

The Laryngoscope, 129(6):1301-1307.

OBJECTIVES/HYPOTHESIS: "Hot potato voice" (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM).

STUDY DESIGN: Quasi-experimental research design.

METHODS: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed.

RESULTS: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%.

CONCLUSIONS: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy.

LEVEL OF EVIDENCE: 2c Laryngoscope, 129:1301-1307, 2019.

RevDate: 2019-01-15
CmpDate: 2018-11-26

Chen Q, Liu J, Yang HM, et al (2018)

Research on tunable distributed SPR sensor based on bimetal film.

Applied optics, 57(26):7591-7599.

In order to overcome the limitations in range of traditional prism structure surface plasmon resonance (SPR) single-point sensor measurement, a symmetric bimetallic film SPR multi-sensor structure is proposed. Based on this, the dual-channel sensing attenuation mechanism of SPR in gold and silver composite film and the improvement of sensing characteristics were studied. By optimizing the characteristics such as material and thickness, a wider range of dual-channel distributed sensing is realized. Using a He-Ne laser (632.8 nm) as the reference light source, prism-excited symmetric SPR sensing was studied theoretically for a symmetrical metal-clad dielectric waveguide using thin-film optics theory. The influence of the angle of incidence of the light source and the thickness of the dielectric layer on the performance of SPR dual formant sensing is explained. The finite-difference time-domain method was used for the simulation calculation for various thicknesses and compositions of the symmetric combined layer, resulting in the choice of silver (30 nm) and gold (10 nm). When the incident angle was 78 deg, the quality factor reached 5960, showing an excellent resonance sensing effect. The sensitivity reached a maximum of 5.25×10-5 RIU when testing the water content of an aqueous solution of honey, which proves the feasibility and practicality of the structure design. The structure improves the theoretical basis for designing an SPR multi-channel distributed sensing system, which can greatly reduce the cost of biochemical detection and significantly increase the detection efficiency.

RevDate: 2018-11-18

Graf S, Schwiebacher J, Richter L, et al (2018)

Adjustment of Vocal Tract Shape via Biofeedback: Influence on Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30326-6 [Epub ahead of print].

The study assessed 30 nonprofessional singers to evaluate the effects of vocal tract shape adjustment via increased resonance toward an externally applied sinusoidal frequency of 900 Hz without phonation. The amplification of the sound wave was used as biofeedback signal and the intensity and the formant position of the basic vowels /a/, /e/, /i/, /o/, and /u/ were compared before and after a vocal tract adjustment period. After the adjustment period, the intensities for all vowels increased and the measured changes correlated with the participants' self-perception.The diferences between the second formant position of the vowels and the applied frequency influences the changes in amplitude and in formant frequencies. The most significant changes in formant frequency occurred with vowels that did not include a formant frequency of 900 Hz, while the increase in amplitude was the strongest for vowels with a formant frequency of about 900 Hz.

RevDate: 2018-11-16

Bhat GS, Reddy CKA, Shankar N, et al (2018)

Smartphone based real-time super Gaussian single microphone Speech Enhancement to improve intelligibility for hearing aid users using formant information.

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2018:5503-5506.

In this paper, we present a Speech Enhancement (SE) technique to improve intelligibility of speech perceived by Hearing Aid users using smartphone as an assistive device. We use the formant frequency information to improve the overall quality and intelligibility of the speech. The proposed SE method is based on new super Gaussian joint maximum a Posteriori (SGJMAP) estimator. Using the priori information of formant frequency locations, the derived gain function has " tradeoff" factors that allows the smartphone user to customize perceptual preference, by controlling the amount of noise suppression and speech distortion in real-time. The formant frequency information helps the hearing aid user to control the gains over the non-formant frequency band, allowing the HA users to attain more noise suppression while maintaining the speech intelligibility using a smartphone application. Objective intelligibility measures and subjective results reflect the usability of the developed SE application in noisy real world acoustic environment.

RevDate: 2019-07-29

Williams D, Escudero P, A Gafos (2018)

Spectral change and duration as cues in Australian English listeners' front vowel categorization.

The Journal of the Acoustical Society of America, 144(3):EL215.

Australian English /iː/, /ɪ/, and /ɪə/ exhibit almost identical average first (F1) and second (F2) formant frequencies and differ in duration and vowel inherent spectral change (VISC). The cues of duration, F1 × F2 trajectory direction (TD) and trajectory length (TL) were assessed in listeners' categorization of /iː/ and /ɪə/ compared to /ɪ/. Duration was important for distinguishing both /iː/ and /ɪə/ from /ɪ/. TD and TL were important for categorizing /iː/ versus /ɪ/, whereas only TL was important for /ɪə/ versus /ɪ/. Finally, listeners' use of duration and VISC was not mutually affected for either vowel compared to /ɪ/.

RevDate: 2019-09-02
CmpDate: 2019-09-02

Gómez-Vilda P, Gómez-Rodellar A, Vicente JMF, et al (2019)

Neuromechanical Modelling of Articulatory Movements from Surface Electromyography and Speech Formants.

International journal of neural systems, 29(2):1850039.

Speech articulation is produced by the movements of muscles in the larynx, pharynx, mouth and face. Therefore speech shows acoustic features as formants which are directly related with neuromotor actions of these muscles. The first two formants are strongly related with jaw and tongue muscular activity. Speech can be used as a simple and ubiquitous signal, easy to record and process, either locally or on e-Health platforms. This fact may open a wide set of applications in the study of functional grading and monitoring neurodegenerative diseases. A relevant question, in this sense, is how far speech correlates and neuromotor actions are related. This preliminary study is intended to find answers to this question by using surface electromyographic recordings on the masseter and the acoustic kinematics related with the first formant. It is shown in the study that relevant correlations can be found among the surface electromyographic activity (dynamic muscle behavior) and the positions and first derivatives of the first formant (kinematic variables related to vertical velocity and acceleration of the joint jaw and tongue biomechanical system). As an application example, it is shown that the probability density function associated to these kinematic variables is more sensitive than classical features as Vowel Space Area (VSA) or Formant Centralization Ratio (FCR) in characterizing neuromotor degeneration in Parkinson's Disease.

RevDate: 2018-12-11
CmpDate: 2018-12-11

Lopes LW, Alves JDN, Evangelista DDS, et al (2018)

Accuracy of traditional and formant acoustic measurements in the evaluation of vocal quality.

CoDAS, 30(5):e20170282 pii:S2317-17822018000500310.

PURPOSE: Investigate the accuracy of isolated and combined acoustic measurements in the discrimination of voice deviation intensity (GD) and predominant voice quality (PVQ) in patients with dysphonia.

METHODS: A total of 302 female patients with voice complaints participated in the study. The sustained /ɛ/ vowel was used to extract the following acoustic measures: mean and standard deviation (SD) of fundamental frequency (F0), jitter, shimmer, glottal to noise excitation (GNE) ratio and the mean of the first three formants (F1, F2, and F3). Auditory-perceptual evaluation of GD and PVQ was conducted by three speech-language pathologists who were voice specialists.

RESULTS: In isolation, only GNE provided satisfactory performance when discriminating between GD and PVQ. Improvement in the classification of GD and PVQ was observed when the acoustic measures were combined. Mean F0, F2, and GNE (healthy × mild-to-moderate deviation), the SDs of F0, F1, and F3 (mild-to-moderate × moderate deviation), and mean jitter and GNE (moderate × intense deviation) were the best combinations for discriminating GD. The best combinations for discriminating PVQ were mean F0, shimmer, and GNE (healthy × rough), F3 and GNE (healthy × breathy), mean F 0, F3, and GNE (rough × tense), and mean F0 , F1, and GNE (breathy × tense).

CONCLUSION: In isolation, GNE proved to be the only acoustic parameter capable of discriminating between GG and PVQ. There was a gain in classification performance for discrimination of both GD and PVQ when traditional and formant acoustic measurements were combined.

RevDate: 2018-10-23

Grawunder S, Crockford C, Clay Z, et al (2018)

Higher fundamental frequency in bonobos is explained by larynx morphology.

Current biology : CB, 28(20):R1188-R1189.

Acoustic signals, shaped by natural and sexual selection, reveal ecological and social selection pressures [1]. Examining acoustic signals together with morphology can be particularly revealing. But this approach has rarely been applied to primates, where clues to the evolutionary trajectory of human communication may be found. Across vertebrate species, there is a close relationship between body size and acoustic parameters, such as formant dispersion and fundamental frequency (f0). Deviations from this acoustic allometry usually produce calls with a lower f0 than expected for a given body size, often due to morphological adaptations in the larynx or vocal tract [2]. An unusual example of an obvious mismatch between fundamental frequency and body size is found in the two closest living relatives of humans, bonobos (Pan paniscus) and chimpanzees (Pan troglodytes). Although these two ape species overlap in body size [3], bonobo calls have a strikingly higher f0 than corresponding calls from chimpanzees [4]. Here, we compare acoustic structures of calls from bonobos and chimpanzees in relation to their larynx morphology. We found that shorter vocal fold length in bonobos compared to chimpanzees accounted for species differences in f0, showing a rare case of positive selection for signal diminution in both bonobo sexes.

RevDate: 2019-07-26

Niziolek CA, S Kiran (2018)

Assessing speech correction abilities with acoustic analyses: Evidence of preserved online correction in persons with aphasia.

International journal of speech-language pathology, 20(6):659-668.

Purpose: Disorders of speech production may be accompanied by abnormal processing of speech sensory feedback. Here, we introduce a semi-automated analysis designed to assess the degree to which speakers use natural online feedback to decrease acoustic variability in spoken words. Because production deficits in aphasia have been hypothesised to stem from problems with sensorimotor integration, we investigated whether persons with aphasia (PWA) can correct their speech acoustics online. Method: Eight PWA in the chronic stage produced 200 repetitions each of three monosyllabic words. Formant variability was measured for each vowel in multiple time windows within the syllable, and the reduction in formant variability from vowel onset to midpoint was quantified. Result: PWA significantly decreased acoustic variability over the course of the syllable, providing evidence of online feedback correction mechanisms. The magnitude of this corrective formant movement exceeded past measurements in control participants. Conclusion: Vowel centreing behaviour suggests that error correction abilities are at least partially spared in speakers with aphasia, and may be relied upon to compensate for feedforward deficits by bringing utterances back on track. These proof of concept data show the potential of this analysis technique to elucidate the mechanisms underlying disorders of speech production.

RevDate: 2018-10-21

Fazeli M, Moradi N, Soltani M, et al (2018)

Dysphonia Characteristics and Vowel Impairment in Relation to Neurological Status in Patients with Multiple Sclerosis.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(18)30351-5 [Epub ahead of print].

PURPOSE: In this study, we attempted to assess the phonation and articulation subsystem changes in patients with multiple sclerosis compared to healthy individuals using Dysphonia Severity Index and Formant Centralization Ratio with the aim of evaluating the correlation between these two indexes with neurological status.

MATERIALS AND METHODS: A sample of 47 patients with multiple sclerosis and 20 healthy speakers were evaluated. Patients' disease duration and disability were monitored by a neurologist. Dysphonia Severity Index and Formant Centralization Ratio scores were computed for each individual. Acoustic analysis was performed by Praat software; the statistical analysis was run using SPSS 21. To compare multiple sclerosis patients with the control group, Mann-Whitney U test was used for non-normal data and independent-samples t test for normal data. Also a logistic regression was used to compare the data. Correlation between acoustic characteristics and neurological status was verified using Spearman correlation coefficient and linear regression was performed to evaluate the simultaneous effects of neurological data.

RESULTS: Statistical analysis revealed that a significant difference existed between multiple sclerosis and healthy participants. Formant Centralization Ratio had a significant correlation with disease severity.

CONCLUSION: Multiple sclerosis patients would be differentiated from healthy individuals by their phonation and articulatory features. Scores of these two indexes can be considered as appropriate criteria for onset of the speech problems in multiple sclerosis. Also, articulation subsystem changes might be useful signs for the progression of the disease.

RevDate: 2019-04-29

Brabenec L, Klobusiakova P, Barton M, et al (2019)

Non-invasive stimulation of the auditory feedback area for improved articulation in Parkinson's disease.

Parkinsonism & related disorders, 61:187-192.

INTRODUCTION: Hypokinetic dysarthria (HD) is a common symptom of Parkinson's disease (PD) which does not respond well to PD treatments. We investigated acute effects of repetitive transcranial magnetic stimulation (rTMS) of the motor and auditory feedback area on HD in PD using acoustic analysis of speech.

METHODS: We used 10 Hz and 1 Hz stimulation protocols and applied rTMS over the left orofacial primary motor area, the right superior temporal gyrus (STG), and over the vertex (a control stimulation site) in 16 PD patients with HD. A cross-over design was used. Stimulation sites and protocols were randomised across subjects and sessions. Acoustic analysis of a sentence reading task performed inside the MR scanner was used to evaluate rTMS-induced effects on motor speech. Acute fMRI changes due to rTMS were also analysed.

RESULTS: The 1 Hz STG stimulation produced significant increases of the relative standard deviation of the 2nd formant (p = 0.019), i.e. an acoustic parameter describing the tongue and jaw movements. The effects were superior to the control site stimulation and were accompanied by increased resting state functional connectivity between the stimulated region and the right parahippocampal gyrus. The rTMS-induced acoustic changes were correlated with the reading task-related BOLD signal increases of the stimulated area (R = 0.654, p = 0.029).

CONCLUSION: Our results demonstrate for the first time that low-frequency stimulation of the temporal auditory feedback area may improve articulation in PD and enhance functional connectivity between the STG and the cortical region involved in an overt speech control.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Order from Amazon

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )