About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

25 Sep 2020 at 01:45
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 25 Sep 2020 at 01:45 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2020-09-21

Nilsson T, Laukkanen AM, T Syrjä (2020)

Effects of Sixteen Month Voice Training of Student Actors Applying the Linklater Voice Method.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30301-5 [Epub ahead of print].

OBJECTIVE: This study investigates the perceptual and acoustic changes in student actors' voices after 16 months of Linklater Voice training, which is a holistic method to train actors' voices.

METHODS: Eleven (n = 11) actor students' text and Voice Range Profile (VRP) recordings were analyzed pretraining and 16 months posttraining. From text readings at comfortable performance loudness, both perceptual and acoustic analyses were made. Acoustic measures included sound pressure level (SPL), fundamental frequency (fo), and sound level differences between different frequency ranges derived from long-term-average spectrum. Sustained vowels [i:], [o], and [e] abstracted from the text sample were analyzed for formant frequencies F1-F4 and the frequency difference between F4 and F3. The VRP was registered to investigate SPL of the softest and loudest phonations throughout the voice range.

RESULTS: The perceived pitch range during text reading increased significantly. The acoustic result showed a strong trend toward decreasing in minimum fo, and increasing in maximum fo and fo range. The VRP showed a significant increase in the fo range and dynamics (SPL range). Perceived voice production showed a trend toward phonation balance (neither pressed-nor breathy) and darker voice color posttraining.

CONCLUSION: The perceptual and acoustic analysis of text reading and acoustic measures of VRP suggest that LV training has a positive impact on voice.

RevDate: 2020-09-18

Di Natale V, Cantarella G, Manfredi C, et al (2020)

Semioccluded Vocal Tract Exercises Improve Self-Perceived Voice Quality in Healthy Actors.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30273-3 [Epub ahead of print].

PURPOSE: Semi-occluded vocal tract exercises (SOVTE) have shown to lead to more effective and efficient vocal production for individuals with voice disorders and for singers. The aim of the present study is to investigate the effects of a 10-minute SOVTE warm-up protocol on the actors' voice.

METHODS: Twenty-seven professional theater actors (16 females) without voice complaints were audio-recorded while reading aloud, with their acting voice, a short dramatic passage at four time points. Recordings were made: the day before the show, just before and soon after the warm-up protocol which was performed prior to the show and soon after the show. The voice quality was acoustically and auditory-perceptually evaluated and quantified at each time point by blinded raters. Self-assessment parameters anonymously collected pre and post exercising were also analyzed.

RESULTS: No statistically significant differences on perceptual ratings and acoustic parameters were found between pre/post exercise sessions and males/females. A statistically significant improvement was detected in the self-assessment parameters concerning comfort of production, sonorousness, vocal clarity and power.

CONCLUSIONS: Vocal warm-up with the described SOVTE protocol was effective in determining a self-perceived improvement in comfort of production, voice quality and power, although objective evidence was missing. This straightforward protocol could thus be beneficial if routinely utilized by professional actors to facilitate the voice performance.

RevDate: 2020-09-16

Sugathan N, S Maruthy (2020)

Predictive factors for persistence and recovery of stuttering in children: A systematic review.

International journal of speech-language pathology [Epub ahead of print].

PURPOSE: The purpose of this study was to systematically review the available literature on various factors that can predict the persistence and recovery of stuttering in children.

METHOD: An electronic search yielded a total of 35 studies, which considered 44 variables that can be potential factors for predicting persistence and recovery.

RESULT: Among 44 factors studied, only four factors- phonological abilities, articulatory rate, change in the pattern of disfluencies, and trend in stuttering severity over one-year post-onset were identified to be replicated predictors of recovery of the stuttering. Several factors, such as differences in the second formant transition between fluent and disfluent speech, articulatory rate measured in phones/sec, etc., were observed to predict the future course of stuttering. However, these factors lack replicated evidence as predictors.

CONCLUSION: There is clear support only for limited factors as reliable predictors. Also, it is observed to be too early to conclude on several replicated factors due to differences in the age group of participants, participant sample size, and the differences in tools used in research that lead to mixed findings as a predictive factor. Hence there is a need for systematic and replicated testing of the factors identified before initiating their use for clinical purposes.

RevDate: 2020-09-14

Palaparthi A, IR Titze (2020)

Analysis of Glottal Inverse Filtering in the Presence of Source-Filter Interaction.

Speech communication, 123:98-108.

The validity of glottal inverse filtering (GIF) to obtain a glottal flow waveform from radiated pressure signal in the presence and absence of source-filter interaction was studied systematically. A driven vocal fold surface model of vocal fold vibration was used to generate source signals. A one-dimensional wave reflection algorithm was used to solve for acoustic pressures in the vocal tract. Several test signals were generated with and without source-filter interaction at various fundamental frequencies and vowels. Linear Predictive Coding (LPC), Quasi Closed Phase (QCP), and Quadratic Programming (QPR) based algorithms, along with supraglottal impulse response, were used to inverse filter the radiated pressure signals to obtain the glottal flow pulses. The accuracy of each algorithm was tested for its recovery of maximum flow declination rate (MFDR), peak glottal flow, open phase ripple factor, closed phase ripple factor, and mean squared error. The algorithms were also tested for their absolute relative errors of the Normalized Amplitude Quotient, the Quasi-Open Quotient, and the Harmonic Richness Factor. The results indicated that the mean squared error decreased with increase in source-filter interaction level suggesting that the inverse filtering algorithms perform better in the presence of source-filter interaction. All glottal inverse filtering algorithms predicted the open phase ripple factor better than the closed phase ripple factor of a glottal flow waveform, irrespective of the source-filter interaction level. Major prediction errors occurred in the estimation of the closed phase ripple factor, MFDR, peak glottal flow, normalized amplitude quotient, and Quasi-Open Quotient. Feedback-related nonlinearity (source-filter interaction) affected the recovered signal primarily when fo was well below the first formant frequency of a vowel. The prediction error increased when fo was close to the first formant frequency due to the difficulty of estimating the precise value of resonance frequencies, which was exacerbated by nonlinear kinetic losses in the vocal tract.

RevDate: 2020-09-12

Lopes LW, França FP, Evangelista DDS, et al (2020)

Does the Combination of Glottal and Supraglottic Acoustic Measures Improve Discrimination Between Women With and Without Voice Disorders?.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30300-3 [Epub ahead of print].

AIM: To analyze the accuracy of traditional acoustic measurements (F0, perturbation, and noise) and formant measurements in discriminating between women with and without voice disorders, and with different laryngeal disorders.

STUDY DESIGN: A descriptive, cross-sectional, and retrospective.

METHOD: Two hundred and sixty women participated. All participants recorded the spoken vowel /Ɛ/ and underwent laryngeal visual examination. Acoustic measures of the mean and standard deviation of the fundamental frequency (F0), jitter, shimmer, glottal-to-noise excitation ratio, and the values of the first three formants (F1, F2, and F3) were obtained.

RESULTS: Individual acoustic measurements did not demonstrate adequate (<70%) performance when discriminating between women with and without voice disorders. The combination of the standard deviation of the F0, shimmer, glottal-to-noise excitation ratio, F1, F2, and F3 showed acceptable (>70%) performance in classifying women with and without voice disorders. Individual measures of jitter as well as F1 and F3 demonstrated acceptable (>70%) performance when distinguishing women with different laryngeal diagnoses, including without voice disorders (healthy larynges), Reinke's edema, unilateral vocal fold paralysis, and sulcus vocalis. The combination of acoustic measurements showed excellent (>80%) performance when discriminating women without voice disorder from those with Reinke's edema (mean of F0, F1, and F3) and with sulcus vocalis (mean of F0, F1, and F2).

CONCLUSIONS: Individual formant and traditional acoustic measurements do not demonstrate adequate performance when discriminating between women with and without voice disorders. However, the combination of traditional and formant measurements improves the discrimination between the presence and absence of voice disorders and differentiates several laryngeal diagnoses.

RevDate: 2020-09-11

Kishimoto T, Takamiya A, Liang KC, et al (2020)

The project for objective measures using computational psychiatry technology (PROMPT): Rationale, design, and methodology.

Contemporary clinical trials communications, 19:100649 pii:100649.

Introduction: Depressive and neurocognitive disorders are debilitating conditions that account for the leading causes of years lived with disability worldwide. However, there are no biomarkers that are objective or easy-to-obtain in daily clinical practice, which leads to difficulties in assessing treatment response and developing new drugs. New technology allows quantification of features that clinicians perceive as reflective of disorder severity, such as facial expressions, phonic/speech information, body motion, daily activity, and sleep.

Methods: Major depressive disorder, bipolar disorder, and major and minor neurocognitive disorders as well as healthy controls are recruited for the study. A psychiatrist/psychologist conducts conversational 10-min interviews with participants ≤10 times within up to five years of follow-up. Interviews are recorded using RGB and infrared cameras, and an array microphone. As an option, participants are asked to wear wrist-band type devices during the observational period. Various software is used to process the raw video, voice, infrared, and wearable device data. A machine learning approach is used to predict the presence of symptoms, severity, and the improvement/deterioration of symptoms.

Discussion: The overall goal of this proposed study, the Project for Objective Measures Using Computational Psychiatry Technology (PROMPT), is to develop objective, noninvasive, and easy-to-use biomarkers for assessing the severity of depressive and neurocognitive disorders in the hopes of guiding decision-making in clinical settings as well as reducing the risk of clinical trial failure. Challenges may include the large variability of samples, which makes it difficult to extract the features that commonly reflect disorder severity.

Trial Registration: UMIN000021396, University Hospital Medical Information Network (UMIN).

RevDate: 2020-09-03

Skuk VG, Kirchen L, Oberhoffner T, et al (2020)

Parameter-Specific Morphing Reveals Contributions of Timbre and Fundamental Frequency Cues to the Perception of Voice Gender and Age in Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Using naturalistic synthesized speech, we determined the relative importance of acoustic cues in voice gender and age perception in cochlear implant (CI) users. Method We investigated 28 CI users' abilities to utilize fundamental frequency (F0) and timbre in perceiving voice gender (Experiment 1) and vocal age (Experiment 2). Parameter-specific voice morphing was used to selectively control acoustic cues (F0; time; timbre, i.e., formant frequencies, spectral-level information, and aperiodicity, as defined in TANDEM-STRAIGHT) in voice stimuli. Individual differences in CI users' performance were quantified via deviations from the mean performance of 19 normal-hearing (NH) listeners. Results CI users' gender perception seemed exclusively based on F0, whereas NH listeners efficiently used timbre. For age perception, timbre was more informative than F0 for both groups, with minor contributions of temporal cues. While a few CI users performed comparable to NH listeners overall, others were at chance. Separate analyses confirmed that even high-performing CI users classified gender almost exclusively based on F0. While high performers could discriminate age in male and female voices, low performers were close to chance overall but used F0 as a misleading cue to age (classifying female voices as young and male voices as old). Satisfaction with CI generally correlated with performance in age perception. Conclusions We confirmed that CI users' gender classification is mainly based on F0. However, high performers could make reasonable usage of timbre cues in age perception. Overall, parameter-specific morphing can serve to objectively assess individual profiles of CI users' abilities to perceive nonverbal social-communicative vocal signals.

RevDate: 2020-09-02

Hansen JHL, Bokshi M, S Khorram (2020)

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

The Journal of the Acoustical Society of America, 148(2):829.

Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).

RevDate: 2020-09-02

Winn MB, AN Moore (2020)

Perceptual weighting of acoustic cues for accommodating gender-related talker differences heard by listeners with normal hearing and with cochlear implants.

The Journal of the Acoustical Society of America, 148(2):496.

Listeners must accommodate acoustic differences between vocal tracts and speaking styles of conversation partners-a process called normalization or accommodation. This study explores what acoustic cues are used to make this perceptual adjustment by listeners with normal hearing or with cochlear implants, when the acoustic variability is related to the talker's gender. A continuum between /ʃ/ and /s/ was paired with naturally spoken vocalic contexts that were parametrically manipulated to vary by numerous cues for talker gender including fundamental frequency (F0), vocal tract length (formant spacing), and direct spectral contrast with the fricative. The goal was to examine relative contributions of these cues toward the tendency to have a lower-frequency acoustic boundary for fricatives spoken by men (found in numerous previous studies). Normal hearing listeners relied primarily on formant spacing and much less on F0. The CI listeners were individually variable, with the F0 cue emerging as the strongest cue on average.

RevDate: 2020-08-10

Chung H (2020)

Acquisition and Acoustic Patterns of Southern American English /l/ in Young Children.

Journal of speech, language, and hearing research : JSLHR, 63(8):2609-2624.

Purpose The aim of the current study was to examine /l/ developmental patterns in young learners of Southern American English, especially in relation to the effect of word position and phonetic contexts. Method Eighteen children with typically developing speech, aged between 2 and 5 years, produced monosyllabic single words containing singleton /l/ in different word positions (pre- vs. postvocalic /l/) across different vowel contexts (high front vs. low back) and cluster /l/ in different consonant contexts (/pl, bl/ vs. /kl, gl/). Each production was analyzed for its accuracy and acoustic patterns as measured by the first two formant frequencies and their difference (F1, F2, and F2-F1). Results There was great individual variability in /l/ acquisition patterns, with some 2- and 3-year-olds reaching 100% accuracy for prevocalic /l/, while others were below 70%. Overall, accuracy of prevocalic /l/ was higher than that of postvocalic /l/. Acoustic patterns of pre- and postvocalic /l/ showed greater differences in younger children and less apparent differences in 5-year-olds. There were no statistically significant differences between the acoustic patterns of /l/ coded as perceptually acceptable and those coded as misarticulated. There was also no apparent effect of vowel and consonant contexts on /l/ patterns. Conclusion The accuracy patterns of this study suggest an earlier development of /l/, especially prevocalic /l/, than has been reported in previous studies. The differences in acoustic patterns between pre- and postvocalic /l/, which become less apparent with age, may suggest that children alter the way they articulate /l/ with age. No significant acoustic differences between acceptable and misarticulated /l/, especially postvocalic /l/, suggest a gradient nature of /l/ that is dialect specific. This suggests the need for careful consideration of a child's dialect/language background when studying /l/.

RevDate: 2020-08-10

Lee J, Kim H, Y Jung (2020)

Patterns of Misidentified Vowels in Individuals With Dysarthria Secondary to Amyotrophic Lateral Sclerosis.

Journal of speech, language, and hearing research : JSLHR, 63(8):2649-2666.

Purpose The current study examines the pattern of misidentified vowels produced by individuals with dysarthria secondary to amyotrophic lateral sclerosis (ALS). Method Twenty-three individuals with ALS and 22 typical individuals produced 10 monophthongs in an /h/-vowel-/d/ context. One hundred thirty-five listeners completed a forced-choice vowel identification test. Misidentified vowels were examined in terms of the target vowel categories (front-back; low-mid-high) and the direction of misidentification (the directional pattern when the target vowel was misidentified, e.g., misidentification "to a lower vowel"). In addition, acoustic predictors of vowel misidentifications were tested based on log first formant (F1), log second formant, log F1 vowel inherent spectral change, log second formant vowel inherent spectral change, and vowel duration. Results First, high and mid vowels were more frequently misidentified than low vowels for all speaker groups. Second, front and back vowels were misidentified at a similar rate for both the Mild and Severe groups, whereas back vowels were more frequently misidentified than front vowels in typical individuals. Regarding the direction of vowel misidentification, vowel errors were mostly made within the same backness (front-back) category for all groups. In addition, more errors were found toward a lower vowel category than toward a higher vowel category in the Severe group, but not in the Mild group. Overall, log F1 difference was identified as a consistent acoustic predictor of the main vowel misidentification pattern. Conclusion Frequent misidentifications in the vowel height dimension and the acoustic predictor, F1, suggest that limited tongue height control is the major articulatory dysfunction in individuals with ALS. Clinical implications regarding this finding are discussed.

RevDate: 2020-08-09

Easwar V, Birstler J, Harrison A, et al (2020)

The Accuracy of Envelope Following Responses in Predicting Speech Audibility.

Ear and hearing [Epub ahead of print].

OBJECTIVES: The present study aimed to (1) evaluate the accuracy of envelope following responses (EFRs) in predicting speech audibility as a function of the statistical indicator used for objective response detection, stimulus phoneme, frequency, and level, and (2) quantify the minimum sensation level (SL; stimulus level above behavioral threshold) needed for detecting EFRs.

DESIGN: In 21 participants with normal hearing, EFRs were elicited by 8 band-limited phonemes in the male-spoken token /susa∫i/ (2.05 sec) presented between 20 and 65 dB SPL in 15 dB increments. Vowels in /susa∫i/ were modified to elicit two EFRs simultaneously by selectively lowering the fundamental frequency (f0) in the first formant (F1) region. The modified vowels elicited one EFR from the low-frequency F1 and another from the mid-frequency second and higher formants (F2+). Fricatives were amplitude-modulated at the average f0. EFRs were extracted from single-channel EEG recorded between the vertex (Cz) and the nape of the neck when /susa∫i/ was presented monaurally for 450 sweeps. The performance of the three statistical indicators, F-test, Hotelling's T, and phase coherence, was compared against behaviorally determined audibility (estimated SL, SL ≥0 dB = audible) using area under the receiver operating characteristics (AUROC) curve, sensitivity (the proportion of audible speech with a detectable EFR [true positive rate]), and specificity (the proportion of inaudible speech with an undetectable EFR [true negative rate]). The influence of stimulus phoneme, frequency, and level on the accuracy of EFRs in predicting speech audibility was assessed by comparing sensitivity, specificity, positive predictive value (PPV; the proportion of detected EFRs elicited by audible stimuli) and negative predictive value (NPV; the proportion of undetected EFRs elicited by inaudible stimuli). The minimum SL needed for detection was evaluated using a linear mixed-effects model with the predictor variables stimulus and EFR detection p value.

RESULTS: AUROCs of the 3 statistical indicators were similar; however, at the type I error rate of 5%, the sensitivities of Hotelling's T (68.4%) and phase coherence (68.8%) were significantly higher than the F-test (59.5%). In contrast, the specificity of the F-test (97.3%) was significantly higher than the Hotelling's T (88.4%). When analyzed using Hotelling's T as a function of stimulus, fricatives offered higher sensitivity (88.6 to 90.6%) and NPV (57.9 to 76.0%) compared with most vowel stimuli (51.9 to 71.4% and 11.6 to 51.3%, respectively). When analyzed as a function of frequency band (F1, F2+, and fricatives aggregated as low-, mid- and high-frequencies, respectively), high-frequency stimuli offered the highest sensitivity (96.9%) and NPV (88.9%). When analyzed as a function of test level, sensitivity improved with increases in stimulus level (99.4% at 65 dB SPL). The minimum SL for EFR detection ranged between 13.4 and 21.7 dB for F1 stimuli, 7.8 to 12.2 dB for F2+ stimuli, and 2.3 to 3.9 dB for fricative stimuli.

CONCLUSIONS: EFR-based inference of speech audibility requires consideration of the statistical indicator used, phoneme, stimulus frequency, and stimulus level.

RevDate: 2020-08-05

Koo SK, Kwon SB, Koh TK, et al (2020)

Acoustic analyses of snoring sounds using a smartphone in patients undergoing septoplasty and turbinoplasty.

European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery pii:10.1007/s00405-020-06268-1 [Epub ahead of print].

PURPOSE: Several studies have been performed using recently developed smartphone-based acoustic analysis techniques. We investigated the effects of septoplasty and turbinoplasty in patients with nasal septal deviation and turbinate hypertrophy accompanied by snoring by recording the sounds of snoring using a smartphone and performing acoustic analysis.

METHODS: A total of 15 male patients who underwent septoplasty with turbinoplasty for snoring and nasal obstruction were included in this prospective study. Preoperatively and 2 months after surgery, their bed partners or caregivers were instructed to record the snoring sounds. The intensity (dB), formant frequencies (F1, F2, F3, and F4), spectrogram pattern, and visual analog scale (VAS) score were analyzed for each subject.

RESULTS: Overall snoring sounds improved after surgery in 12/15 (80%) patients, and there was significant improvement in the intensity of snoring sounds after surgery (from 64.17 ± 12.18 dB to 55.62 ± 9.11 dB, p = 0.018). There was a significant difference in the F1 formant frequency before and after surgery (p = 0.031), but there were no significant differences in F2, F3, or F4. The change in F1 indicated that patients changed from mouth breathing to normal breathing. The degree of subjective snoring sounds improved significantly after surgery (VAS: from 5.40 ± 1.55 to 3.80 ± 1.26, p = 0.003).

CONCLUSION: Our results confirm that snoring is reduced when nasal congestion is improved, and they demonstrate that smartphone-based acoustic analysis of snoring sounds can be useful for diagnosis.

RevDate: 2020-08-01

Scott TL, Haenchen L, Daliri A, et al (2020)

Noninvasive neurostimulation of left ventral motor cortex enhances sensorimotor adaptation in speech production.

Brain and language, 209:104840 pii:S0093-934X(20)30099-7 [Epub ahead of print].

Sensorimotor adaptation-enduring changes to motor commands due to sensory feedback-allows speakers to match their articulations to intended speech acoustics. How the brain integrates auditory feedback to modify speech motor commands and what limits the degree of these modifications remain unknown. Here, we investigated the role of speech motor cortex in modifying stored speech motor plans. In a within-subjects design, participants underwent separate sessions of sham and anodal transcranial direct current stimulation (tDCS) over speech motor cortex while speaking and receiving altered auditory feedback of the first formant. Anodal tDCS increased the rate of sensorimotor adaptation for feedback perturbation. Computational modeling of our results using the Directions Into Velocities of Articulators (DIVA) framework of speech production suggested that tDCS primarily affected behavior by increasing the feedforward learning rate. This study demonstrates how focal noninvasive neurostimulation can enhance the integration of auditory feedback into speech motor plans.

RevDate: 2020-07-28

Chung H, Munson B, J Edwards (2020)

Cross-Linguistic Perceptual Categorization of the Three Corner Vowels: Effects of Listener Language and Talker Age.

Language and speech [Epub ahead of print].

The present study examined the center and size of naïve adult listeners' vowel perceptual space (VPS) in relation to listener language (LL) and talker age (TA). Adult listeners of three different first languages, American English, Greek, and Korean, categorized and rated the goodness of different vowels produced by 2-year-olds and 5-year-olds and adult speakers of those languages, and speakers of Cantonese and Japanese. The center (i.e., mean first and second formant frequencies (F1 and F2)) and size (i.e., area in the F1/F2 space) of VPSs that were categorized either into /a/, /i/, or /u/ were calculated for each LL and TA group. All center and size calculations were weighted by the goodness rating of each stimulus. The F1 and F2 values of the vowel category (VC) centers differed significantly by LL and TA. These effects were qualitatively different for the three vowel categories: English listeners had different /a/ and /u/ centers than Greek and Korean listeners. The size of VPSs did not differ significantly by LL, but did differ by TA and VCs: Greek and Korean listeners had larger vowel spaces when perceiving vowels produced by 2-year-olds than by 5-year-olds or adults, and English listeners had larger vowel spaces for /a/ than /i/ or /u/. Findings indicate that vowel perceptual categories of listeners varied by the nature of their native vowel system, and were sensitive to TA.

RevDate: 2020-07-26

Bhat GS, Shankar N, Reddy CKA, et al (2017)

Formant Frequency-based Speech Enhancement Technique to improve Intelligibility for hearing aid users with smartphone as an assistive device.

... Health innovations and point-of-care technologies conference. Health innovations and point-of-care technologies conference, 2017:32-35.

In this paper, we present a Speech Enhancement (SE) method implemented on a smartphone, and this arrangement functions as an assistive device to hearing aids (HA). Many benchmark single channel SE algorithms implemented on HAs provide considerable improvement in speech quality, while speech intelligibility improvement still remains a prime challenge. The proposed SE method based on Log spectral amplitude estimator improves speech intelligibility in the noisy real world acoustic environment using the priori information of formant frequency locations. The formant frequency information avails us to control the amount of speech distortion in these frequency bands, thereby controlling speech distortion. We introduce a 'scaling' parameter for the SE gain function, which controls the gains over the non-formant frequency band, allowing the HA users to customize the playback speech using a smartphone application to their listening preference. Objective intelligibility measures show the effectiveness of the proposed SE method. Subjective results reflect the suitability of the developed Speech Enhancement application in real-world noisy conditions at SNR levels of -5 dB, 0 dB and 5 dB.

RevDate: 2020-07-22

Mefferd AS, MS Dietrich (2020)

Tongue- and Jaw-Specific Articulatory Changes and Their Acoustic Consequences in Talkers With Dysarthria due to Amyotrophic Lateral Sclerosis: Effects of Loud, Clear, and Slow Speech.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose This study aimed to determine how tongue and jaw displacement changes impact acoustic vowel contrast in talkers with amyotrophic lateral sclerosis (ALS) and controls. Method Ten talkers with ALS and 14 controls participated in this study. Loud, clear, and slow speech cues were used to elicit tongue and jaw kinematic as well as acoustic changes. Speech kinematics was recorded using three-dimensional articulography. Independent tongue and jaw displacements were extracted during the diphthong /ai/ in kite. Acoustic distance between diphthong onset and offset in Formant 1-Formant 2 vowel space indexed acoustic vowel contrast. Results In both groups, all three speech modifications elicited increases in jaw displacement (typical < slow < loud < clear). By contrast, only slow speech elicited significantly increased independent tongue displacement in the ALS group (typical = loud = clear < slow), whereas all three speech modifications elicited significantly increased independent tongue displacement in controls (typical < loud < clear = slow). Furthermore, acoustic vowel contrast significantly increased in response to clear and slow speech in the ALS group, whereas all three speech modifications elicited significant increases in acoustic vowel contrast in controls (typical < loud < slow < clear). Finally, only jaw displacements accounted for acoustic vowel contrast gains in the ALS group. In controls, however, independent tongue displacements accounted for increases in vowel acoustic contrast during loud and slow speech, whereas jaw and independent tongue displacements accounted equally for acoustic vowel contrast change during clear speech. Conclusion Kinematic findings suggest that slow speech may be better suited to target independent tongue displacements in talkers with ALS than clear and loud speech. However, given that gains in acoustic vowel contrast were comparable for slow and clear speech cues in these talkers, future research is needed to determine potential differential impacts of slow and clear speech on perceptual measures, such as intelligibility. Finally, findings suggest that acoustic vowel contrast gains are predominantly jaw driven in talkers with ALS. Therefore, the acoustic and perceptual consequences of direct instructions of enhanced jaw movements should be compared to cued speech modification, such as clear and slow speech in these talkers.

RevDate: 2020-07-22

Laturnus R (2020)

Comparative Acoustic Analyses of L2 English: The Search for Systematic Variation.

Phonetica pii:000508387 [Epub ahead of print].

BACKGROUND/AIMS: Previous research has shown that exposure to multiple foreign accents facilitates adaptation to an untrained novel accent. One explanation is that L2 speech varies systematically such that there are commonalities in the productions of nonnative speakers, regardless of their language background.

METHODS: A systematic acoustic comparison was conducted between 3 native English speakers and 6 nonnative accents. Voice onset time, unstressed vowel duration, and formant values of stressed and unstressed vowels were analyzed, comparing each nonnative accent to the native English talkers. A subsequent perception experiment tests what effect training on regionally accented voices has on the participant's comprehension of nonnative accented speech to investigate the importance of within-speaker variation on attunement and generalization.

RESULTS: Data for each measure show substantial variability across speakers, reflecting phonetic transfer from individual L1s, as well as substantial inconsistency and variability in pronunciation, rather than commonalities in their productions. Training on native English varieties did not improve participants' accuracy in understanding nonnative speech.

CONCLUSION: These findings are more consistent with a hypothesis of accent attune-ment wherein listeners track general patterns of nonnative speech rather than relying on overlapping acoustic signals between speakers.

RevDate: 2020-07-22

Rishiq D, Harkrider A, Springer C, et al (2020)

Effects of Aging on the Subcortical Encoding of Stop Consonants.

American journal of audiology [Epub ahead of print].

Purpose The main purpose of this study was to evaluate aging effects on the predominantly subcortical (brainstem) encoding of the second-formant frequency transition, an essential acoustic cue for perceiving place of articulation. Method Synthetic consonant-vowel syllables varying in second-formant onset frequency (i.e., /ba/, /da/, and /ga/ stimuli) were used to elicit speech-evoked auditory brainstem responses (speech-ABRs) in 16 young adults (Mage = 21 years) and 11 older adults (Mage = 59 years). Repeated-measures mixed-model analyses of variance were performed on the latencies and amplitudes of the speech-ABR peaks. Fixed factors were phoneme (repeated measures on three levels: /b/ vs. /d/ vs. /g/) and age (two levels: young vs. older). Results Speech-ABR differences were observed between the two groups (young vs. older adults). Specifically, older listeners showed generalized amplitude reductions for onset and major peaks. Significant Phoneme × Group interactions were not observed. Conclusions Results showed aging effects in speech-ABR amplitudes that may reflect diminished subcortical encoding of consonants in older listeners. These aging effects were not phoneme dependent as observed using the statistical methods of this study.

RevDate: 2020-07-13

Al-Tamimi F, P Howell (2020)

Voice onset time and formant onset frequencies in Arabic stuttered speech.

Clinical linguistics & phonetics [Epub ahead of print].

Neuromuscular models of stuttering consider that making transitions between phones results in inappropriate temporal arrangements of articulators in people who stutter (PWS). Using this framework, the current study examined the acoustic productions of two fine-grained phonetic features: voice onset time (VOT) and second formant (F2). The hypotheses were that PWS should differ from fluent persons (FP) in VOT duration and F2 onset frequency as a result of the transition deficit for environments with complex phonetic features such as Arabic emphatics. Ten adolescent PWS and 10 adolescent FPs participated in the study. They read and memorized four monosyllabic plain-emphatic words silently. Data were analyzed by Repeated Measures ANOVAs. The positive and negative VOT durations of/t/vs./tˁ/and/d/vs./dˁ/and F2 onset frequency were measured acoustically. Results showed that stuttering was significantly affected by emphatic consonants. PWS had atypical VOT durations and F2 values. Findings are consistent with the atypicality of VOT and F2 reported for English-speaking PWS. This atypicality is realized differently in Arabic depending on the articulatory complexity and cognitive load of the sound.

RevDate: 2020-07-10

Levy-Lambert D, Grigos MI, LeBlanc É, et al (2020)

Communication Efficiency in a Face Transplant Recipient: Determinants and Therapeutic Implications.

The Journal of craniofacial surgery [Epub ahead of print].

We longitudinally assessed speech intelligibility (percent words correct/pwc), communication efficiency (intelligible words per minute/iwpm), temporal control markers (speech and pause coefficients of variation), and formant frequencies associated with lip motion in a 41-year-old face transplant recipient. Pwc and iwpm at 13 months post-transplantation were both higher than preoperative values. Multivariate regression demonstrated that temporal markers and all formant frequencies associated with lip motion were significant predictors (P < 0.05) of communication efficiency, highlighting the interplay of these variables in generating intelligible and effective speech. These findings can guide us in developing personalized rehabilitative approaches in face transplant recipients for optimal speech outcomes.

RevDate: 2020-07-08

Kim KS, Wang H, L Max (2020)

It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Various aspects of speech production related to auditory-motor integration and learning have been examined through auditory feedback perturbation paradigms in which participants' acoustic speech output is experimentally altered and played back via earphones/headphones "in real time." Scientific rigor requires high precision in determining and reporting the involved hardware and software latencies. Many reports in the literature, however, are not consistent with the minimum achievable latency for a given experimental setup. Here, we focus specifically on this methodological issue associated with implementing real-time auditory feedback perturbations, and we offer concrete suggestions for increased reproducibility in this particular line of work. Method Hardware and software latencies as well as total feedback loop latency were measured for formant perturbation studies with the Audapter software. Measurements were conducted for various audio interfaces, desktop and laptop computers, and audio drivers. An approach for lowering Audapter's software latency through nondefault parameter specification was also tested. Results Oft-overlooked hardware-specific latencies were not negligible for some of the tested audio interfaces (adding up to 15 ms). Total feedback loop latencies (including both hardware and software latency) were also generally larger than claimed in the literature. Nondefault parameter values can improve Audapter's own processing latency without negative impact on formant tracking. Conclusions Audio interface selection and software parameter optimization substantially affect total feedback loop latency. Thus, the actual total latency (hardware plus software) needs to be correctly measured and described in all published reports. Future speech research with "real-time" auditory feedback perturbations should increase scientific rigor by minimizing this latency.

RevDate: 2020-07-06

Vurma A (2020)

Amplitude Effects of Vocal Tract Resonance Adjustments When Singing Louder.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30194-6 [Epub ahead of print].

In the literature on vocal pedagogy we may find suggestions to increase the mouth opening when singing louder. It is known that sopranos tend to sing loud high notes with a wider mouth opening which raises the frequency of the first resonance of the vocal tract (fR1) to tune it close to the fundamental. Our experiment with classically trained male singers revealed that they also tended to raise the fR1 with the dynamics at pitches where the formant tuning does not seem relevant. The analysis by synthesis showed that such behaviour may contribute to the strengthening of the singer's formant by several dB-s and to a rise in the centre of spectral gravity. The contribution of the fR1 raising to the overall sound level was less consistent. Changing the extent of the mouth opening with the dynamics may create several simultaneous semantic cues that signal how prominent the produced sound is and how great the physical effort by the singer is. The diminishing of the mouth opening when singing piano may also have an importance as it helps singers to produce a quieter sound by increasing the distance between the fR1 and higher resonances, which lowers the transfer function of the vocal tract at the relevant spectral regions.

RevDate: 2020-07-03

Chaturvedi R, Kraus M, RSE Keefe (2020)

A new measure of authentic auditory emotion recognition: Application to patients with schizophrenia.

Schizophrenia research pii:S0920-9964(19)30550-X [Epub ahead of print].

BACKGROUND: Many social processes such as emotion recognition are severely impaired in patients with schizophrenia. While basic auditory processing seems to play a key role in identifying emotions, research in this field is limited due to the lack of proper assessment batteries. Many of the widely accepted tests utilize actors to portray certain emotions-these batteries are less ecologically and face valid.

METHODS: This study utilized a newly developed auditory emotion recognition test that contained natural stimuli from spontaneous displays of emotions to assess 28 patients with schizophrenia and 16 healthy controls.

RESULTS: The results indicate that the newly developed test, referred to as the INTONATION Test, is more sensitive to the emotion recognition deficits in patients with schizophrenia than previously used measures. The correlations of the INTONATION Test measures with basic auditory processes were similar to established tests of auditory emotion. Particular emotion sub scores from the INTONTATION test, such as happiness, demonstrated the strongest correlations with specific auditory processing skills, such as formant discrimination and sinusoidal amplitude modulation detection (SAM60).

CONCLUSIONS: The results from this study indicate that auditory emotion recognition impairments are more pronounced in patients with schizophrenia when perceiving authentic displays of emotion. Understanding these deficits could help specify the nature of auditory emotion recognition deficits in patients with schizophrenia and those at risk.

RevDate: 2020-07-02

Toutios A, Xu M, Byrd D, et al (2020)

How an aglossic speaker produces an alveolar-like percept without a functional tongue tip.

The Journal of the Acoustical Society of America, 147(6):EL460.

It has been previously observed [McMicken, Salles, Berg, Vento-Wilson, Rogers, Toutios, and Narayanan. (2017). J. Commun. Disorders, Deaf Stud. Hear. Aids 5(2), 1-6] using real-time magnetic resonance imaging that a speaker with severe congenital tongue hypoplasia (aglossia) had developed a compensatory articulatory strategy where she, in the absence of a functional tongue tip, produced a plosive consonant perceptually similar to /d/ using a bilabial constriction. The present paper provides an updated account of this strategy. It is suggested that the previously observed compensatory bilabial closing that occurs during this speaker's /d/ production is consistent with vocal tract shaping resulting from hyoid raising created with mylohyoid action, which may also be involved in typical /d/ production. Simulating this strategy in a dynamic articulatory synthesis experiment leads to the generation of /d/-like formant transitions.

RevDate: 2020-07-02

Harper S, Goldstein L, S Narayanan (2020)

Variability in individual constriction contributions to third formant values in American English /ɹ/.

The Journal of the Acoustical Society of America, 147(6):3905.

Although substantial variability is observed in the articulatory implementation of the constriction gestures involved in /ɹ/ production, studies of articulatory-acoustic relations in /ɹ/ have largely ignored the potential for subtle variation in the implementation of these gestures to affect salient acoustic dimensions. This study examines how variation in the articulation of American English /ɹ/ influences the relative sensitivity of the third formant to variation in palatal, pharyngeal, and labial constriction degree. Simultaneously recorded articulatory and acoustic data from six speakers in the USC-TIMIT corpus was analyzed to determine how variation in the implementation of each constriction across tokens of /ɹ/ relates to variation in third formant values. Results show that third formant values are differentially affected by constriction degree for the different constrictions used to produce /ɹ/. Additionally, interspeaker variation is observed in the relative effect of different constriction gestures on third formant values, most notably in a division between speakers exhibiting relatively equal effects of palatal and pharyngeal constriction degree on F3 and speakers exhibiting a stronger palatal effect. This division among speakers mirrors interspeaker differences in mean constriction length and location, suggesting that individual differences in /ɹ/ production lead to variation in articulatory-acoustic relations.

RevDate: 2020-06-25

Xu M, Tachibana RO, Okanoya K, et al (2020)

Unconscious and Distinctive Control of Vocal Pitch and Timbre During Altered Auditory Feedback.

Frontiers in psychology, 11:1224.

Vocal control plays a critical role in smooth social communication. Speakers constantly monitor auditory feedback (AF) and make adjustments when their voices deviate from their intentions. Previous studies have shown that when certain acoustic features of the AF are artificially altered, speakers compensate for this alteration in the opposite direction. However, little is known about how the vocal control system implements compensations for alterations of different acoustic features, and associates them with subjective consciousness. The present study investigated whether compensations for the fundamental frequency (F0), which corresponds to perceived pitch, and formants, which contribute to perceived timbre, can be performed unconsciously and independently. Forty native Japanese speakers received two types of altered AF during vowel production that involved shifts of either only the formant frequencies (formant modification; Fm) or both the pitch and formant frequencies (pitch + formant modification; PFm). For each type, three levels of shift (slight, medium, and severe) in both directions (increase or decrease) were used. After the experiment, participants were tested for whether they had perceived a change in the F0 and/or formants. The results showed that (i) only formants were compensated for in the Fm condition, while both the F0 and formants were compensated for in the PFm condition; (ii) the F0 compensation exhibited greater precision than the formant compensation in PFm; and (iii) compensation occurred even when participants misperceived or could not explicitly perceive the alteration in AF. These findings indicate that non-experts can compensate for both formant and F0 modifications in the AF during vocal production, even when the modifications are not explicitly or correctly perceived, which provides further evidence for a dissociation between conscious perception and action in vocal control. We propose that such unconscious control of voice production may enhance rapid adaptation to changing speech environments and facilitate mutual communication.

RevDate: 2020-06-19

White-Schwoch T, Magohe AK, Fellows AM, et al (2020)

Auditory neurophysiology reveals central nervous system dysfunction in HIV-infected individuals.

Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology, 131(8):1827-1832 pii:S1388-2457(20)30327-8 [Epub ahead of print].

OBJECTIVE: To test the hypothesis that human immunodeficiency virus (HIV) affects auditory-neurophysiological functions.

METHODS: A convenience sample of 68 HIV+ and 59 HIV- normal-hearing adults was selected from a study set in Dar es Salaam, Tanzania. The speech-evoked frequency-following response (FFR), an objective measure of auditory function, was collected. Outcome measures were FFRs to the fundamental frequency (F0) and to harmonics corresponding to the first formant (F1), two behaviorally relevant cues for understanding speech.

RESULTS: The HIV+ group had weaker responses to the F1 than the HIV- group; this effect generalized across multiple stimuli (d = 0.59). Responses to the F0 were similar between groups.

CONCLUSIONS: Auditory-neurophysiological responses differ between HIV+ and HIV- adults despite normal hearing thresholds.

SIGNIFICANCE: The FFR may reflect HIV-associated central nervous system dysfunction that manifests as disrupted auditory processing of speech harmonics corresponding to the first formant.

RevDate: 2020-06-19

DiNino M, Arenberg JG, Duchen ALR, et al (2020)

Effects of Age and Cochlear Implantation on Spectrally Cued Speech Categorization.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Weighting of acoustic cues for perceiving place-of-articulation speech contrasts was measured to determine the separate and interactive effects of age and use of cochlear implants (CIs). It has been found that adults with normal hearing (NH) show reliance on fine-grained spectral information (e.g., formants), whereas adults with CIs show reliance on broad spectral shape (e.g., spectral tilt). In question was whether children with NH and CIs would demonstrate the same patterns as adults, or show differences based on ongoing maturation of hearing and phonetic skills. Method Children and adults with NH and with CIs categorized a /b/-/d/ speech contrast based on two orthogonal spectral cues. Among CI users, phonetic cue weights were compared to vowel identification scores and Spectral-Temporally Modulated Ripple Test thresholds. Results NH children and adults both relied relatively more on the fine-grained formant cue and less on the broad spectral tilt cue compared to participants with CIs. However, early-implanted children with CIs better utilized the formant cue compared to adult CI users. Formant cue weights correlated with CI participants' vowel recognition and in children, also related to Spectral-Temporally Modulated Ripple Test thresholds. Adults and child CI users with very poor phonetic perception showed additive use of the two cues, whereas those with better and/or more mature cue usage showed a prioritized trading relationship, akin to NH listeners. Conclusions Age group and hearing modality can influence phonetic cue-weighting patterns. Results suggest that simple nonlexical categorization tests correlate with more general speech recognition skills of children and adults with CIs.

RevDate: 2020-06-16

Chiu YF, Neel A, T Loux (2020)

Acoustic characteristics in relation to intelligibility reduction in noise for speakers with Parkinson's disease.

Clinical linguistics & phonetics [Epub ahead of print].

Decreased speech intelligibility in noisy environments is frequently observed in speakers with Parkinson's disease (PD). This study investigated which acoustic characteristics across the speech subsystems contributed to poor intelligibility in noise for speakers with PD. Speech samples were obtained from 13 speakers with PD and five healthy controls reading 56 sentences. Intelligibility analysis was conducted in quiet and noisy listening conditions. Seventy-two young listeners transcribed the recorded sentences in quiet and another 72 listeners transcribed in noise. The acoustic characteristics of the speakers with PD who experienced large intelligibility reduction from quiet to noise were compared to those with smaller intelligibility reduction in noise and healthy controls. The acoustic measures in the study included second formant transitions, cepstral and spectral measures of voice (cepstral peak prominence and low/high spectral ratio), pitch variation, and articulation rate to represent speech components across speech subsystems of articulation, phonation, and prosody. The results show that speakers with PD who had larger intelligibility reduction in noise exhibited decreased second formant transition, limited cepstral and spectral variations, and faster articulation rate. These findings suggest that the adverse effect of noise on speech intelligibility in PD is related to speech changes in the articulatory and phonatory systems.

RevDate: 2020-06-15

Rankinen W, K de Jong (2020)

The Entanglement of Dialectal Variation and Speaker Normalization.

Language and speech [Epub ahead of print].

This paper explores the relationship between speaker normalization and dialectal identity in sociolinguistic data, examining a database of vowel formants collected from 88 monolingual American English speakers in Michigan's Upper Peninsula. Audio recordings of Finnish- and Italian-heritage American English speakers reading a passage and a word list were normalized using two normalization procedures. These algorithms are based on different concepts of normalization: Lobanov, which models normalization as based on experience with individual talkers, and Labov ANAE, which models normalization as based on experience with scale-factors inherent in acoustic resonators of all kinds. The two procedures yielded different results; while the Labov ANAE method reveals a cluster shifting of low and back vowels that correlated with heritage, the Lobanov procedure seems to eliminate this sociolinguistic variation. The difference between the two procedures lies in how they treat relations between formant changes, suggesting that dimensions of variation in the vowel space may be treated differently by different normalization procedures, raising the question of how anatomical variation and dialectal variation interact in the real world. The structure of the sociolinguistic effects found with the Labov ANAE normalized data, but not in the Lobanov normalized data, suggest that the Lobanov normalization does over-normalize formant measures and remove sociolinguistically relevant information.

RevDate: 2020-06-09

Ménard L, Prémont A, Trudeau-Fisette P, et al (2020)

Phonetic Implementation of Prosodic Emphasis in Preschool-Aged Children and Adults: Probing the Development of Sensorimotor Speech Goals.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Objective We aimed to investigate the production of contrastive emphasis in French-speaking 4-year-olds and adults. Based on previous work, we predicted that, due to their immature motor control abilities, preschool-aged children would produce smaller articulatory differences between emphasized and neutral syllables than adults. Method Ten 4-year-old children and 10 adult French speakers were recorded while repeating /bib/, /bub/, and /bab/ sequences in neutral and contrastive emphasis conditions. Synchronous recordings of tongue movements, lip and jaw positions, and speech signals were made. Lip positions and tongue shapes were analyzed; formant frequencies, amplitude, fundamental frequency, and duration were extracted from the acoustic signals; and between-vowel contrasts were calculated. Results Emphasized vowels were higher in pitch, intensity, and duration than their neutral counterparts in all participants. However, the effect of contrastive emphasis on lip position was smaller in children. Prosody did not affect tongue position in children, whereas it did in adults. As a result, children's productions were perceived less accurately than those of adults. Conclusion These findings suggest that 4-year-old children have not yet learned to produce hypoarticulated forms of phonemic goals to allow them to successfully contrast syllables and enhance prosodic saliency.

RevDate: 2020-05-07

Groll MD, McKenna VS, Hablani S, et al (2020)

Formant-Estimated Vocal Tract Length and Extrinsic Laryngeal Muscle Activation During Modulation of Vocal Effort in Healthy Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose The goal of this study was to explore the relationships among vocal effort, extrinsic laryngeal muscle activity, and vocal tract length (VTL) within healthy speakers. We hypothesized that increased vocal effort would result in increased suprahyoid muscle activation and decreased VTL, as previously observed in individuals with vocal hyperfunction. Method Twenty-eight healthy speakers of American English produced vowel-consonant-vowel utterances under varying levels of vocal effort. VTL was estimated from the vowel formants. Three surface electromyography sensors measured the activation of the suprahyoid and infrahyoid muscle groups. A general linear model was used to investigate the effects of vocal effort level and surface electromyography on VTL. Two additional general linear models were used to investigate the effects of vocal effort on suprahyoid and infrahyoid muscle activities. Results Neither vocal effort nor extrinsic muscle activity showed significant effects on VTL; however, the degree of extrinsic muscle activity of both suprahyoid and infrahyoid muscle groups increased with increases in vocal effort. Conclusion Increasing vocal effort resulted in increased activation of both suprahyoid and infrahyoid musculature in healthy adults, with no change to VTL.

RevDate: 2020-05-06

Zhou H, Lu J, Zhang C, et al (2020)

Abnormal Acoustic Features Following Pharyngeal Flap Surgery in Patients Aged Six Years and Older.

The Journal of craniofacial surgery [Epub ahead of print].

In our study, older velopharyngeal insufficiency (posterior velopharyngeal insufficiency) patients were defined as those older than 6 years of age. This study aimed to evaluate the abnormal acoustic features of older velopharyngeal insufficiency patients before and after posterior pharyngeal flap surgery. A retrospective medical record review was conducted for patients aged 6 years and older, who underwent posterior pharyngeal flap surgery between November 2011 and March 2015. The audio records of patients were evaluated before and after surgery. Spectral analysis was conducted by the Computer Speech Lab (CSL)-4150B acoustic system with the following input data: The vowel /i/, unaspirated plosive /b/, aspirated plosives /p/, aspirated fricatives /s/ and /x/, unaspirated affricates /j/ and /z/, and aspirated affricates /c/ and /q/. The patients were followed up for 3 months. Speech outcome was evaluated by comparing the postoperatively phonetic data with preoperative data. Subjective and objective analyses showed significant differences in the sonogram, formant, and speech articulation before and after the posterior pharyngeal flap surgery. However, the sampled patients could not be considered to have a high speech articulation (<85%) as the normal value was above or equal to 96%. Our results showed that pharyngeal flap surgery could correct the speech function of older patients with posterior velopharyngeal insufficiency to some extent. Owing to the original errors in pronunciation patterns, pathological speech articulation still existed, and speech treatment is required in the future.

RevDate: 2020-05-03

Almurashi W, Al-Tamimi J, G Khattab (2020)

Static and dynamic cues in vowel production in Hijazi Arabic.

The Journal of the Acoustical Society of America, 147(4):2917.

Static cues such as formant measurements obtained at the vowel midpoint are usually taken as the main correlate for vowel identification. However, dynamic cues such as vowel-inherent spectral change have been shown to yield better classification of vowels using discriminant analysis. The aim of this study is to evaluate the role of static versus dynamic cues in Hijazi Arabic (HA) vowel classification, in addition to vowel duration and F3, which are not usually looked at. Data from 12 male HA speakers producing eight HA vowels in /hVd/ syllables were obtained, and classification accuracy was evaluated using discriminant analysis. Dynamic cues, particularly the three-point model, had higher classification rates (average 95.5%) than the remaining models (static model: 93.5%; other dynamic models: between 65.75% and 94.25%). Vowel duration had a significant role in classification accuracy (average +8%). These results are in line with dynamic approaches to vowel classification and highlight the relative importance of cues such as vowel duration across languages, particularly where it is prominent in the phonology.

RevDate: 2020-05-03

Egurtzegi A, C Carignan (2020)

An acoustic description of Mixean Basque.

The Journal of the Acoustical Society of America, 147(4):2791.

This paper presents an acoustic analysis of Mixean Low Navarrese, an endangered variety of Basque. The manuscript includes an overview of previous acoustic studies performed on different Basque varieties in order to synthesize the sparse acoustic descriptions of the language that are available. This synthesis serves as a basis for the acoustic analysis performed in the current study, in which the various acoustic analyses given in previous studies are replicated in a single, cohesive general acoustic description of Mixean Basque. The analyses include formant and duration measurements for the six-vowel system, voice onset time measurements for the three-way stop system, spectral center of gravity for the sibilants, and number of lingual contacts in the alveolar rhotic tap and trill. Important findings include: a centralized realization ([ʉ]) of the high-front rounded vowel usually described as /y/; a data-driven confirmation of the three-way laryngeal opposition in the stop system; evidence in support of an alveolo-palatal to apical sibilant merger; and the discovery of a possible incipient merger of rhotics. These results show how using experimental acoustic methods to study under-represented linguistic varieties can result in revelations of sound patterns otherwise undescribed in more commonly studied varieties of the same language.

RevDate: 2020-05-03

Mellesmoen G, M Babel (2020)

Acoustically distinct and perceptually ambiguous: ʔayʔaǰuθəm (Salish) fricatives.

The Journal of the Acoustical Society of America, 147(4):2959.

ʔayʔaǰuθəm (Comox-Sliammon) is a Central Salish language spoken in British Columbia with a large fricative inventory. Previous impressionistic descriptions of ʔayʔaǰuθəm have noted perceptual ambiguity of select anterior fricatives. This paper provides an auditory-acoustic description of the four anterior fricatives /θ s ʃ ɬ/ in the Mainland dialect of ʔayʔaǰuθəm. Peak ERBN trajectories, noise duration, and formant transitions are analysed in the fricative productions of five speakers. These analyses provide quantitative and qualitative descriptions of these fricative contrasts, indicating more robust acoustic differentiation for fricatives in onset versus coda position. In a perception task, English listeners categorized fricatives in CV and VC sequences from the natural productions. The results of the perception experiment are consistent with reported perceptual ambiguity between /s/ and /θ/, with listeners frequently misidentifying /θ/ as /s/. The production and perception data suggest that listener L1 categories play a role in the categorization and discrimination of ʔayʔaǰuθəm fricatives. These findings provide an empirical description of fricatives in an understudied language and have implications for L2 teaching and learning in language revitalization contexts.

RevDate: 2020-05-03

Rosen N, Stewart J, ON Sammons (2020)

How "mixed" is mixed language phonology? An acoustic analysis of the Michif vowel system.

The Journal of the Acoustical Society of America, 147(4):2989.

Michif, a severely endangered language still spoken today by an estimated 100-200 Métis people in Western Canada, is generally classified as a mixed language, meaning it cannot be traced back to a single language family [Bakker (1997). A Language of Our Own (Oxford University Press, Oxford); Thomason (2001). Language Contact: An Introduction (Edinburgh University Press and Georgetown University Press, Edinburgh and Washington, DC); Meakins (2013). Contact Languages: A Comprehensive Guide (Mouton De Gruyter, Berlin), pp. 159-228.]. It has been claimed to maintain the phonological grammar of both of its source languages, French and Plains Cree [Rhodes (1977). Actes du Huitieme congrès des Algonqunistes (Carleton University, Ottawa), pp. 6-25; Bakker (1997). A Language of Our Own (Oxford University Press, Oxford); Bakker and Papen (1997). Contact Languages: A Wider Perspective (John Benjamins, Amsterdam), pp. 295-363]. The goal of this paper is twofold: to offer an instrumental analysis of Michif vowels and to investigate this claim of a stratified grammar, based on this careful phonetic analysis. Using source language as a variable in the analysis, the authors argue the Michif vowel system does not appear to rely on historical information, and that historically similar French and Cree vowels pattern together within the Michif system with regards to formant frequencies and duration. The authors show that there are nine Michif oral vowels in this system, which has merged phonetically similar French- and Cree-source vowels.

RevDate: 2020-05-03

van Brenk F, H Terband (2020)

Compensatory and adaptive responses to real-time formant shifts in adults and children.

The Journal of the Acoustical Society of America, 147(4):2261.

Auditory feedback plays an important role in speech motor learning, yet, little is known about the strength of motor learning and feedback control in speech development. This study investigated compensatory and adaptive responses to auditory feedback perturbation in children (aged 4-9 years old) and young adults (aged 18-29 years old). Auditory feedback was perturbed by near-real-time shifting F1 and F2 of the vowel /ɪː/ during the production of consonant-vowel-consonant words. Children were able to compensate and adapt in a similar or larger degree compared to young adults. Higher token-to-token variability was found in children compared to adults but not disproportionately higher during the perturbation phases compared to the unperturbed baseline. The added challenge to auditory-motor integration did not influence production variability in children, and compensation and adaptation effects were found to be strong and sustainable. Significant group differences were absent in the proportions of speakers displaying a compensatory or adaptive response, an amplifying response, or no consistent response. Within these categories, children produced significantly stronger compensatory, adaptive, or amplifying responses, which could be explained by less-ingrained existing representations. The results are interpreted as both auditory-motor integration and learning capacities are stronger in young children compared to adults.

RevDate: 2020-05-03

Chiu C, JT Sun (2020)

On pharyngealized vowels in Northern Horpa: An acoustic and ultrasound study.

The Journal of the Acoustical Society of America, 147(4):2928.

In the Northern Horpa (NH) language of Sichuan, vowels are divided between plain and pharyngealized sets, with the latter pronounced with auxiliary articulatory gestures involving more constriction in the vocal tract. The current study examines how the NH vocalic contrast is manifested in line with the process of pharyngealization both acoustically and articulatorily, based on freshly gathered data from two varieties of the language (i.e., Rtsangkhog and Yunasche). Along with formant analyses, ultrasound imaging was employed to capture the tongue postures and positions during vowel production. The results show that in contrast with plain vowels, pharyngealized vowels generally feature lower F2 values and higher F1 and F3 values. Mixed results for F2 and F3 suggest that the quality contrasts are vowel-dependent. Ultrasound images, on the other hand, reveal that the vocalic distinction is affected by different types of tongue movements, including retraction, backing, and double bunching, depending on the inherent tongue positions for each vowel. The two NH varieties investigated are found to display differential formant changes and different types of tongue displacements. The formant profiles along with ultrasound images support the view that the production of the NH phonologically marked vowels is characteristic of pharyngealization.

RevDate: 2020-05-03

Horo L, Sarmah P, GDS Anderson (2020)

Acoustic phonetic study of the Sora vowel system.

The Journal of the Acoustical Society of America, 147(4):3000.

This paper is an acoustic phonetic study of vowels in Sora, a Munda language of the Austroasiatic language family. Descriptions here illustrate that the Sora vowel system has six vowels and provide evidence that Sora disyllables have prominence on the second syllable. While the acoustic categorization of vowels is based on formant frequencies, the presence of prominence on the second syllable is shown through temporal features of vowels, including duration, intensity, and fundamental frequency. Additionally, this paper demonstrates that acoustic categorization of vowels in Sora is better in the prominent syllable than in the non-prominent syllable, providing evidence that syllable prominence and vowel quality are correlated in Sora. These acoustic properties of Sora vowels are discussed in relation to the existing debates on vowels and patterns of syllable prominence in Munda languages of India. In this regard, it is noteworthy that Munda languages, in general, lack instrumental studies, and therefore this paper presents significant findings that are undocumented in other Munda languages. These acoustic studies are supported by exploratory statistical modeling and statistical classification methods.

RevDate: 2020-05-03

Sarvasy H, Elvin J, Li W, et al (2020)

An acoustic phonetic description of Nungon vowels.

The Journal of the Acoustical Society of America, 147(4):2891.

This study is a comprehensive acoustic description and analysis of the six vowels /i e a u o ɔ/ in the Towet dialect of the Papuan language Nungon ⟨yuw⟩ of northeastern Papua New Guinea. Vowel tokens were extracted from a corpus of audio speech recordings created for general language documentation and grammatical description. To assess the phonetic correlates of a claimed phonological vowel length distinction, vowel duration was measured. Multi-point acoustic analyses enabled investigation of mean vowel F1, F2, and F3; vowel trajectories, and coarticulation effects. The three Nungon back vowels were of particular interest, as they contribute to an asymmetrical, back vowel-heavy array, and /o/ had previously been described as having an especially low F2. The authors found that duration of phonologically long and short vowels differed significantly. Mean vowel formant measurements confirmed that the six phonological vowels form six distinct acoustic groupings; trajectories show slightly more formant movement in some vowels than was previously known. Adjacent nasal consonants exerted significant effects on vowel formant measurements. The authors show that an uncontrolled, general documentation corpus for an under-described language can be mined for acoustic analysis, but coarticulation effects should be taken into account.

RevDate: 2020-05-03

Nance C, S Kirkham (2020)

The acoustics of three-way lateral and nasal palatalisation contrasts in Scottish Gaelic.

The Journal of the Acoustical Society of America, 147(4):2858.

This paper presents an acoustic description of laterals and nasals in an endangered minority language, Scottish Gaelic (known as "Gaelic"). Gaelic sonorants are reported to take part in a typologically unusual three-way palatalisation contrast. Here, the acoustic evidence for this contrast is considered, comparing lateral and nasal consonants in both word-initial and word-final position. Previous acoustic work has considered lateral consonants, but nasals are much less well-described. An acoustic analysis of twelve Gaelic-dominant speakers resident in a traditionally Gaelic-speaking community is reported. Sonorant quality is quantified via measurements of F2-F1 and F3-F2 and observation of the whole spectrum. Additionally, we quantify extensive devoicing in word-final laterals that has not been previously reported. Mixed-effects regression modelling suggests robust three-way acoustic differences in lateral consonants in all relevant vowel contexts. Nasal consonants, however, display lesser evidence of the three-way contrast in formant values and across the spectrum. Potential reasons for lesser evidence of contrast in the nasal system are discussed, including the nature of nasal acoustics, evidence from historical changes, and comparison to other Goidelic dialects. In doing so, contributions are made to accounts of the acoustics of the Celtic languages, and to typologies of contrastive palatalisation in the world's languages.

RevDate: 2020-05-03

Tabain M, Butcher A, Breen G, et al (2020)

A formant study of the alveolar versus retroflex contrast in three Central Australian languages: Stop, nasal, and lateral manners of articulation.

The Journal of the Acoustical Society of America, 147(4):2745.

This study presents formant transition data from 21 speakers for the apical alveolar∼retroflex contrast in three neighbouring Central Australian languages: Arrernte, Pitjantjatjara, and Warlpiri. The contrast is examined for three manners of articulation: stop, nasal, and lateral /t ∼ ʈ/ /n ∼ ɳ/, and /l ∼ ɭ/, and three vowel contexts /a i u/. As expected, results show that a lower F3 and F4 in the preceding vowel signal a retroflex consonant; and that the alveolar∼retroflex contrast is most clearly realized in the context of an /a/ vowel, and least clearly realized in the context of an /i/ vowel. Results also show that the contrast is most clearly realized for the stop manner of articulation. These results provide an acoustic basis for the greater typological rarity of retroflex nasals and laterals as compared to stops. It is suggested that possible nasalization of the preceding vowel accounts for the poorer nasal consonant results, and that articulatory constraints on lateral consonant production account for the poorer lateral consonant results. Importantly, differences are noticed between speakers, and it is suggested that literacy plays a major role in maintenance of this marginal phonemic contrast.

RevDate: 2020-04-27

Liepins R, Kaider A, Honeder C, et al (2020)

Formant frequency discrimination with a fine structure sound coding strategy for cochlear implants.

Hearing research, 392:107970 pii:S0378-5955(19)30207-2 [Epub ahead of print].

Recent sound coding strategies for cochlear implants (CI) have focused on the transmission of temporal fine structure to the CI recipient. To date, knowledge about the effects of fine structure coding in electrical hearing is poorly charactarized. The aim of this study was to examine whether the presence of temporal fine structure coding affects how the CI recipient perceives sound. This was done by comparing two sound coding strategies with different temporal fine structure coverage in a longitudinal cross-over setting. The more recent FS4 coding strategy provides fine structure coding on typically four apical stimulation channels compared to FSP with usually one or two fine structure channels. 34 adult CI patients with a minimum CI experience of one year were included. All subjects were fitted according to clinical routine and used both coding strategies for three months in a randomized sequence. Formant frequency discrimination thresholds (FFDT) were measured to assess the ability to resolve timbre information. Further outcome measures included a monosyllables test in quiet and the speech reception threshold of an adaptive matrix sentence test in noise (Oldenburger sentence test). In addition, the subjective sound quality was assessed using visual analogue scales and a sound quality questionnaire after each three months period. The extended fine structure range of FS4 yields FFDT similar to FSP for formants occurring in the frequency range only covered by FS4. There is a significant interaction (p = 0.048) between the extent of fine structure coverage in FSP and the improvement in FFDT in favour of FS4 for these stimuli. FS4 Speech perception in noise and quiet was similar with both coding strategies. Sound quality was rated heterogeneously showing that both strategies represent valuable options for CI fitting to allow for best possible individual optimization.

RevDate: 2020-04-22

Toyoda A, Maruhashi T, Malaivijitnond S, et al (2020)

Dominance status and copulatory vocalizations among male stump-tailed macaques in Thailand.

Primates; journal of primatology pii:10.1007/s10329-020-00820-7 [Epub ahead of print].

Male copulation calls sometimes play important roles in sexual strategies, attracting conspecific females or advertising their social status to conspecific males. These calls generally occur in sexually competitive societies such as harem groups and multi-male and multi-female societies. However, the call functions remain unclear because of limited availability of data sets that include a large number of male and female animals in naturalistic environments, particularly in primates. Here, we examined the possible function of male-specific copulation calls in wild stump-tailed macaques (Macaca arctoides) by analyzing the contexts and acoustic features of vocalizations. We observed 395 wild stump-tailed macaques inhabiting the Khao Krapuk Khao Taomor Non-Hunting Area in Thailand and recorded all occurrences of observed copulations. We counted 446 male-specific calls in 383 copulations recorded, and measured their acoustic characteristics. Data were categorized into three groups depending on their social status: dominant (alpha and coalition) males and non-dominant males. When comparing male status, alpha males most frequently produced copulation calls at ejaculation, coalition males produced less frequent calls than alpha males, and other non-dominant males rarely vocalized, maintaining silence even when mounting females. Acoustic analysis indicated no significant influence of status (alpha or coalition) on call number, bout duration, or further formant dispersion parameters. Our results suggest that male copulation calls of this species are social status-dependent signals. Furthermore, dominant males might actively transmit their social status and copulations to other male rivals to impede their challenging attacks, while other non-dominant males maintain silence to prevent the interference of dominants.

RevDate: 2020-04-19

Saldías M, Laukkanen AM, Guzmán M, et al (2020)

The Vocal Tract in Loud Twang-Like Singing While Producing High and Low Pitches.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30057-6 [Epub ahead of print].

Twang-like vocal qualities have been related to a megaphone-like shape of the vocal tract (epilaryngeal tube and pharyngeal narrowing, and a wider mouth opening), low-frequency spectral changes, and tighter and/or increased vocal fold adduction. Previous studies have focused mainly on loud and high-pitched singing, comfortable low-pitched spoken vowels, or are based on modeling and simulation. There is no data available related to twang-like voices in loud, low-pitched singing.

PURPOSE: This study investigates the possible contribution of the lower and upper vocal tract configurations during loud twang-like singing on high and low pitches in a real subject.

METHODS: One male contemporary commercial music singer produced a sustained vowel [a:] in his habitual speaking pitch (B2) and loudness. The same vowel was also produced in a loud twang-like singing voice on high (G4) and low pitches (B2). Computerized tomography, acoustic analysis, inverse filtering, and audio-perceptual assessments were performed.

RESULTS: Both loud twang-like voices showed a megaphone-like shape of the vocal tract, being more notable on the low pitch. Also, low-frequency spectral changes, a peak of sound energy around 3 kHz and increased vocal fold adduction were found. Results agreed with audio-perceptual evaluation.

CONCLUSIONS: Loud twang-like phonation seems to be mainly related to low-frequency spectral changes (under 2 kHz) and a more compact formant structure. Twang-like qualities seem to require different degrees of twang-related vocal tract adjustments while phonating in different pitches. A wider mouth opening, pharyngeal constriction, and epilaryngeal tube narrowing may be helpful strategies for maximum power transfer and improved vocal economy in loud contemporary commercial music singing and potentially in loud speech. Further studies should focus on vocal efficiency and vocal economy measurements using modeling and simulation, based on real-singers' data.

RevDate: 2020-04-17

Yaralı M (2020)

Varying effect of noise on sound onset and acoustic change evoked auditory cortical N1 responses evoked by a vowel-vowel stimulus.

International journal of psychophysiology : official journal of the International Organization of Psychophysiology pii:S0167-8760(20)30077-5 [Epub ahead of print].

INTRODUCTION: According to previous studies noise causes prolonged latencies and decreased amplitudes in acoustic change evoked cortical responses. Particularly for a consonant-vowel stimulus, speech shaped noise leads to more pronounced changes on onset evoked response than acoustic change evoked response. Reasoning that this may be related to the spectral characteristics of the stimuli and the noise, in the current study a vowel-vowel stimulus (/ui/) was presented in white noise during cortical response recordings. The hypothesis is that the effect of noise will be higher on acoustic change N1 compared to onset N1 due to the masking effects on formant transitions.

METHODS: Onset and acoustic change evoked auditory cortical N1-P2 responses were obtained from 21 young adults with normal hearing while presenting 1000 ms/ui/stimuli in quiet and in white noise at +10 dB and 0 dB signal-to-noise ratio (SNR).

RESULTS: In the quiet and +10 dB SNR conditions, the N1-P2 responses to both onset and change were present. In the +10 dB SNR condition acoustic change N1-P2 peak-to-peak amplitudes were reduced and N1 latencies were prolonged compared to the quiet condition. Whereas there was not a significant change in onset N1 latencies and N1-P2 peak-to-peak amplitudes in the +10 dB SNR condition. In the 0 dB SNR condition change responses were not observed but onset N1-P2 peak-to-peak amplitudes were significantly lower, and onset N1 latencies were significantly higher compared to the quiet and the 10 dB SNR conditions. Onset and change responses were also compared with each other in each condition. N1 latencies and N1-P2 peak to peak amplitudes of onset and acoustic change were not significantly different in the quiet condition. Whereas at 10 dB SNR, acoustic change N1 latencies were higher and N1-P2 amplitudes were lower than onset latencies and amplitudes. To summarize, presentation of white noise at 10 dB SNR resulted in the reduction of acoustic change evoked N1-P2 peak-to-peak amplitudes and the prolongation of N1 latencies compared to quiet. Same effect on onsets were only observed at 0 dB SNR, where acoustic change N1 was not observed. In the quiet condition, latencies and amplitudes of onsets and changes were not different. Whereas at 10 dB SNR, acoustic change N1 latencies were higher, amplitudes were lower than onset N1.

DISCUSSION/CONCLUSIONS: The effect of noise was found to be higher on acoustic change evoked N1 response compared to onset N1. This may be related to the spectral characteristics of the utilized noise and the stimuli, possible differences in acoustic features of sound onsets and acoustic changes, or to the possible differences in the mechanisms for detecting acoustic changes and sound onsets. In order to investigate the possible reasons for more pronounced effect of noise on acoustic changes, future work with different vowel-vowel transitions in different noise types is suggested.

RevDate: 2020-04-04

Tykalova T, Skrabal D, Boril T, et al (2020)

Effect of Ageing on Acoustic Characteristics of Voice Pitch and Formants in Czech Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30086-2 [Epub ahead of print].

BACKGROUND: The relevance of formant-based measures has been noted across a spectrum of medical, technical, and linguistic applications. Therefore, the primary aim of the study was to evaluate the effect of ageing on vowel articulation, as the previous research revealed contradictory findings. The secondary aim was to provide normative acoustic data for all Czech monophthongs.

METHODS: The database consisted of 100 healthy speakers (50 men and 50 women) aged between 20 and 90. Acoustic characteristics, including vowel duration, vowel space area (VSA), fundamental frequency (fo), and the first to fourth formant frequencies (F1-F4) of 10 Czech vowels were extracted from a reading passage. In addition, the articulation rate was calculated from the entire duration of the reading passage.

RESULTS: Age-related changes in pitch were sex-dependent, while age-related alterations in F2/a/, F2/u/, VSA, and vowel duration seemed to be sex-independent. In particular, we observed a clear lowering of fo with age for women, but no change for men. With regard to formants, we found lowering of F2/a/ and F2/u/ with increased age, but no statistically significant changes in F1, F3, or F4 frequencies with advanced age. Although the alterations in F1 and F2 frequencies were rather small, they appeared to be in a direction against vowel centralization, resulting in a significantly greater VSA in the older population. The greater VSA was found to be related partly to longer vowel duration.

CONCLUSIONS: Alterations in vowel formant frequencies across several decades of adult life appear to be small or in a direction against vowel centralization, thus indicating the good preservation of articulatory precision in older speakers.

RevDate: 2020-04-02

Milenkovic PH, Wagner M, Kent RD, et al (2020)

Effects of sampling rate and type of anti-aliasing filter on linear-predictive estimates of formant frequencies in men, women, and children.

The Journal of the Acoustical Society of America, 147(3):EL221.

The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.

RevDate: 2020-03-20

Deloche F (2020)

Fine-grained statistical structure of speech.

PloS one, 15(3):e0230233 pii:PONE-D-19-01235.

In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.

RevDate: 2020-03-20

Hardy TLD, Boliek CA, Aalto D, et al (2020)

Contributions of Voice and Nonverbal Communication to Perceived Masculinity-Femininity for Cisgender and Transgender Communicators.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity-femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women (n = 10 of each) and transgender women (n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers (n = 20) rated each communicator's masculinity-femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity-femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity-femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.

RevDate: 2020-03-11

Carl M, Kent RD, Levy ES, et al (2020)

Vowel Acoustics and Speech Intelligibility in Young Adults With Down Syndrome.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Speech production deficits and reduced intelligibility are frequently noted in individuals with Down syndrome (DS) and are attributed to a combination of several factors. This study reports acoustic data on vowel production in young adults with DS and relates these findings to perceptual analysis of speech intelligibility. Method Participants were eight young adults with DS as well as eight age- and gender-matched typically developing (TD) controls. Several different acoustic measures of vowel centralization and variability were applied to tokens of corner vowels (/ɑ/, /æ/, /i/, /u/) produced in common English words. Intelligibility was assessed for single-word productions of speakers with DS, by means of transcriptions from 14 adult listeners. Results Group differentiation was found for some, but not all, of the acoustic measures. Low vowels were more acoustically centralized and variable in speakers with DS than TD controls. Acoustic findings were associated with overall intelligibility scores. Vowel formant dispersion was the most sensitive measure in distinguishing DS and TD formant data. Conclusion Corner vowels are differentially affected in speakers with DS. The acoustic characterization of vowel production and its association with speech intelligibility scores within the DS group support the conclusion of motor control deficits in the overall speech impairment. Implications are discussed for effective treatment planning.

RevDate: 2020-03-09

Zhang T, Shao Y, Wu Y, et al (2020)

Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder.

IEEE journal of biomedical and health informatics [Epub ahead of print].

Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.

RevDate: 2020-03-01

Allison KM, Salehi S, JR Green (2020)

Effect of prosodic manipulation on articulatory kinematics and second formant trajectories in children.

The Journal of the Acoustical Society of America, 147(2):769.

This study investigated effects of rate reduction and emphatic stress cues on second formant (F2) trajectories and articulatory movements during diphthong production in 11 typically developing school-aged children. F2 extent increased in slow and emphatic stress conditions, and tongue and jaw displacement increased in the emphatic stress condition compared to habitual speech. Tongue displacement significantly predicted F2 extent across speaking conditions. Results suggest that slow rate and emphatic stress cues induce articulatory and acoustic changes in children that may enhance clarity of the acoustic signal. Potential clinical implications for improving speech in children with dysarthria are discussed.

RevDate: 2020-03-01

Summers RJ, B Roberts (2020)

Informational masking of speech by acoustically similar intelligible and unintelligible interferers.

The Journal of the Acoustical Society of America, 147(2):1113.

Masking experienced when target speech is accompanied by a single interfering voice is often primarily informational masking (IM). IM is generally greater when the interferer is intelligible than when it is not (e.g., speech from an unfamiliar language), but the relative contributions of acoustic-phonetic and linguistic interference are often difficult to assess owing to acoustic differences between interferers (e.g., different talkers). Three-formant analogues (F1+F2+F3) of natural sentences were used as targets and interferers. Targets were presented monaurally either alone or accompanied contralaterally by interferers from another sentence (F0 = 4 semitones higher); a target-to-masker ratio (TMR) between ears of 0, 6, or 12 dB was used. Interferers were either intelligible or rendered unintelligible by delaying F2 and advancing F3 by 150 ms relative to F1, a manipulation designed to minimize spectro-temporal differences between corresponding interferers. Target-sentence intelligibility (keywords correct) was 67% when presented alone, but fell considerably when an unintelligible interferer was present (49%) and significantly further when the interferer was intelligible (41%). Changes in TMR produced neither a significant main effect nor an interaction with interferer type. Interference with acoustic-phonetic processing of the target can explain much of the impact on intelligibility, but linguistic factors-particularly interferer intrusions-also make an important contribution to IM.

RevDate: 2020-03-01

Winn MB (2020)

Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script.

The Journal of the Acoustical Society of America, 147(2):852.

Voice onset time (VOT) is an acoustic property of stop consonants that is commonly manipulated in studies of phonetic perception. This paper contains a thorough description of the "progressive cutback and replacement" method of VOT manipulation, and comparison with other VOT manipulation techniques. Other acoustic properties that covary with VOT-such as fundamental frequency and formant transitions-are also discussed, along with considerations for testing VOT perception and its relationship to various other measures of auditory temporal or spectral processing. An implementation of the progressive cutback and replacement method in the Praat scripting language is presented, which is suitable for modifying natural speech for perceptual experiments involving VOT and/or related covarying F0 and intensity cues. Justifications are provided for the stimulus design choices and constraints implemented in the script.

RevDate: 2020-02-29

Riggs WJ, Hiss MM, Skidmore J, et al (2020)

Utilizing Electrocochleography as a Microphone for Fully Implantable Cochlear Implants.

Scientific reports, 10(1):3714 pii:10.1038/s41598-020-60694-z.

Current cochlear implants (CIs) are semi-implantable devices with an externally worn sound processor that hosts the microphone and sound processor. A fully implantable device, however, would ultimately be desirable as it would be of great benefit to recipients. While some prototypes have been designed and used in a few select cases, one main stumbling block is the sound input. Specifically, subdermal implantable microphone technology has been poised with physiologic issues such as sound distortion and signal attenuation under the skin. Here we propose an alternative method that utilizes a physiologic response composed of an electrical field generated by the sensory cells of the inner ear to serve as a sound source microphone for fully implantable hearing technology such as CIs. Electrophysiological results obtained from 14 participants (adult and pediatric) document the feasibility of capturing speech properties within the electrocochleography (ECochG) response. Degradation of formant properties of the stimuli /da/ and /ba/ are evaluated across various degrees of hearing loss. Preliminary results suggest proof-of-concept of using the ECochG response as a microphone is feasible to capture vital properties of speech. However, further signal processing refinement is needed in addition to utilization of an intracochlear recording location to likely improve signal fidelity.

RevDate: 2020-02-27

Kim HT (2020)

Vocal Feminization for Transgender Women: Current Strategies and Patient Perspectives.

International journal of general medicine, 13:43-52 pii:205102.

Voice feminization for transgender women is a highly complicated comprehensive transition process. Voice feminization has been thought to be equal to pitch elevation. Thus, many surgical procedures have only focused on pitch raising for voice feminization. However, voice feminization should not only consider voice pitch but also consider gender differences in physical, neurophysiological, and acoustical characteristics of voice. That is why voice therapy has been the preferred choice for the feminization of the voice. Considering gender difference of phonatory system, the method for voice feminization consists of changing the following four critical elements: fundamental frequency, resonance frequency related to vocal tract volume and length, formant tuning, and phonatory pattern. Voice feminizing process can be generally divided into non-surgical feminization and surgical feminization. As a non-surgical procedure, feminization voice therapy consists of increasing fundamental frequency, improving oral and pharyngeal resonance, and behavioral therapy. Surgical feminization usually can be achieved by external approach or endoscopic approach. Based on three factors (length, tension and mass) of vocal fold for pitch modulation, surgical procedure can be classified as one-factor, two-factors and three-factors modification of vocal folds. Recent systematic reviews and meta-analysis studies have reported positive outcomes for both the voice therapy and voice feminization surgery. The benefits of voice therapy, as it is highly satisfactory, mostly increase vocal pitch, and are noninvasive. However, the surgical voice feminization of three-factors modification of vocal folds is also highly competent and provides a maximum absolute increase in vocal pitch. Voice feminization is a long transition journey for physical, neurophysiological, and psychosomatic changes that convert a male phonatory system to a female phonatory system. Therefore, strategies for voice feminization should be individualized according to the individual's physical condition, the desired change in voice pitch, economic conditions, and social roles.

RevDate: 2020-02-20

Levy ES, Moya-Galé G, Chang YM, et al (2020)

Effects of speech cues in French-speaking children with dysarthria.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Articulatory excursion and vocal intensity are reduced in many children with dysarthria due to cerebral palsy (CP), contributing to the children's intelligibility deficits and negatively affecting their social participation. However, the effects of speech-treatment strategies for improving intelligibility in this population are understudied, especially for children who speak languages other than English. In a cueing study on English-speaking children with dysarthria, acoustic variables and intelligibility improved when the children were provided with cues aimed to increase articulatory excursion and vocal intensity. While French is among the top 20 most spoken languages in the world, dysarthria and its management in French-speaking children are virtually unexplored areas of research. Information gleaned from such research is critical for providing an evidence base on which to provide treatment.

AIMS: To examine acoustic and perceptual changes in the speech of French-speaking children with dysarthria, who are provided with speech cues targeting greater articulatory excursion (French translation of 'speak with your big mouth') and vocal intensity (French translation of 'speak with your strong voice'). This study investigated whether, in response to the cues, the children would make acoustic changes and listeners would perceive the children's speech as more intelligible.

METHODS & PROCEDURES: Eleven children with dysarthria due to CP (six girls, five boys; ages 4;11-17;0 years; eight with spastic CP, three with dyskinetic CP) repeated pre-recorded speech stimuli across three speaking conditions (habitual, 'big mouth' and 'strong voice'). Stimuli were sentences and contrastive words in phrases. Acoustic analyses were conducted. A total of 66 Belgian-French listeners transcribed the children's utterances orthographically and rated their ease of understanding on a visual analogue scale at sentence and word levels.

OUTCOMES & RESULTS: Acoustic analyses revealed significantly longer duration in response to the big mouth cue at sentence level and in response to both the big mouth and strong voice cues at word level. Significantly higher vocal sound-pressure levels were found following both cues at sentence and word levels. Both cues elicited significantly higher first-formant vowel frequencies and listeners' greater ease-of-understanding ratings at word level. Increases in the percentage of words transcribed correctly and in sentence ease-of-understanding ratings, however, did not reach statistical significance. Considerable variability between children was observed.

Speech cues targeting greater articulatory excursion and vocal intensity yield significant acoustic changes in French-speaking children with dysarthria. However, the changes may only aid listeners' ease of understanding at word level. The significant findings and great inter-speaker variability are generally consistent with studies on English-speaking children with dysarthria, although changes appear more constrained in these French-speaking children. What this paper adds What is already known on the subject According to the only study comparing effects of speech-cueing strategies on English-speaking children with dysarthria, intelligibility increases when the children are provided with cues aimed to increase articulatory excursion and vocal intensity. Little is known about speech characteristics in French-speaking children with dysarthria and no published research has explored effects of cueing strategies in this population. What this paper adds to existing knowledge This paper is the first study to examine the effects of speech cues on the acoustics and intelligibility of French-speaking children with CP. It provides evidence that the children can make use of cues to modify their speech, although the changes may only aid listeners' ease of understanding at word level. What are the potential or actual clinical implications of this work? For clinicians, the findings suggest that speech cues emphasizing increasing articulatory excursion and vocal intensity show promise for improving the ease of understanding of words produced by francophone children with dysarthria, although improvements may be modest. The variability in the responses also suggests that this population may benefit from a combination of such cues to produce words that are easier to understand.

RevDate: 2020-02-20

Boë LJ, Sawallis TR, Fagot J, et al (2019)

Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science.

Science advances, 5(12):eaaw3916 pii:aaw3916.

Recent articles on primate articulatory abilities are revolutionary regarding speech emergence, a crucial aspect of language evolution, by revealing a human-like system of proto-vowels in nonhuman primates and implicitly throughout our hominid ancestry. This article presents both a schematic history and the state of the art in primate vocalization research and its importance for speech emergence. Recent speech research advances allow more incisive comparison of phylogeny and ontogeny and also an illuminating reinterpretation of vintage primate vocalization data. This review produces three major findings. First, even among primates, laryngeal descent is not uniquely human. Second, laryngeal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns. Thus, evidence now overwhelmingly refutes the long-standing laryngeal descent theory, which pushes back "the dawn of speech" beyond ~200 ka ago to over ~20 Ma ago, a difference of two orders of magnitude.

RevDate: 2020-02-10

Kearney E, Nieto-Castañón A, Weerathunge HR, et al (2019)

A Simple 3-Parameter Model for Examining Adaptation in Speech and Voice Production.

Frontiers in psychology, 10:2995.

Sensorimotor adaptation experiments are commonly used to examine motor learning behavior and to uncover information about the underlying control mechanisms of many motor behaviors, including speech production. In the speech and voice domains, aspects of the acoustic signal are shifted/perturbed over time via auditory feedback manipulations. In response, speakers alter their production in the opposite direction of the shift so that their perceived production is closer to what they intended. This process relies on a combination of feedback and feedforward control mechanisms that are difficult to disentangle. The current study describes and tests a simple 3-parameter mathematical model that quantifies the relative contribution of feedback and feedforward control mechanisms to sensorimotor adaptation. The model is a simplified version of the DIVA model, an adaptive neural network model of speech motor control. The three fitting parameters of SimpleDIVA are associated with the three key subsystems involved in speech motor control, namely auditory feedback control, somatosensory feedback control, and feedforward control. The model is tested through computer simulations that identify optimal model fits to six existing sensorimotor adaptation datasets. We show its utility in (1) interpreting the results of adaptation experiments involving the first and second formant frequencies as well as fundamental frequency; (2) assessing the effects of masking noise in adaptation paradigms; (3) fitting more than one perturbation dimension simultaneously; (4) examining sensorimotor adaptation at different timepoints in the production signal; and (5) quantitatively predicting responses in one experiment using parameters derived from another experiment. The model simulations produce excellent fits to real data across different types of perturbations and experimental paradigms (mean correlation between data and model fits across all six studies = 0.95 ± 0.02). The model parameters provide a mechanistic explanation for the behavioral responses to the adaptation paradigm that are not readily available from the behavioral responses alone. Overall, SimpleDIVA offers new insights into speech and voice motor control and has the potential to inform future directions of speech rehabilitation research in disordered populations. Simulation software, including an easy-to-use graphical user interface, is publicly available to facilitate the use of the model in future studies.

RevDate: 2020-02-09

Viegas F, Viegas D, Serra Guimarães G, et al (2020)

Acoustic Analysis of Voice and Speech in Men with Skeletal Class III Malocclusion: A Pilot Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000505186 [Epub ahead of print].

OBJECTIVES: To assess the fundamental (f0) and first third formant (F1, F2, F3) frequencies of the 7 oral vowels of Brazilian Portuguese in men with skeletal class III malocclusion and to compare these measures with a control group of individuals with Angle's class I.

METHODS: Sixty men aged 18-40 years, 20 with Angle's class III skeletal malocclusion and 40 with Angle's class I malocclusion were selected by speech therapists and dentists. The speech signals were obtained from sustained vowels, and the values of f0 and frequencies of F1, F2 and F3 were estimated. The differences were verified through Student's t test, and the effect size calculation was performed.

RESULTS: In the class III group, more acute f0 values were observed in all vowels, higher values of F1 in the vowels [a] and [ε] and in F2 in the vowels [a], [e] and [i] and lower F1 and F3 values of the vowel [u].

CONCLUSION: More acute f0 values were found in all vowels investigated in the class III group, which showed a higher laryngeal position in the production of these sounds. The frequencies of the first 3 formants showed punctual differences, with higher values of F1 in the vowels [a] and [ε] and of F2 in [a], [e] and [i], and lower values of F1 and F3 in the vowel [u] in the experimental group. Thus, it is concluded that the fundamental frequency of the voice was the main parameter that differentiated the studied group from the control.

RevDate: 2020-02-02

Kelley MC, BV Tucker (2020)

A comparison of four vowel overlap measures.

The Journal of the Acoustical Society of America, 147(1):137.

Multiple measures of vowel overlap have been proposed that use F1, F2, and duration to calculate the degree of overlap between vowel categories. The present study assesses four of these measures: the spectral overlap assessment metric [SOAM; Wassink (2006). J. Acoust. Soc. Am. 119(4), 2334-2350], the a posteriori probability (APP)-based metric [Morrison (2008). J. Acoust. Soc. Am. 123(1), 37-40], the vowel overlap analysis with convex hulls method [VOACH; Haynes and Taylor, (2014). J. Acoust. Soc. Am. 136(2), 883-891], and the Pillai score as first used for vowel overlap by Hay, Warren, and Drager [(2006). J. Phonetics 34(4), 458-484]. Summaries of the measures are presented, and theoretical critiques of them are performed, concluding that the APP-based metric and Pillai score are theoretically preferable to SOAM and VOACH. The measures are empirically assessed using accuracy and precision criteria with Monte Carlo simulations. The Pillai score demonstrates the best overall performance in these tests. The potential applications of vowel overlap measures to research scenarios are discussed, including comparisons of vowel productions between different social groups, as well as acoustic investigations into vowel formant trajectories.

RevDate: 2020-02-02

Renwick MEL, JA Stanley (2020)

Modeling dynamic trajectories of front vowels in the American South.

The Journal of the Acoustical Society of America, 147(1):579.

Regional variation in American English speech is often described in terms of shifts, indicating which vowel sounds are converging or diverging. In the U.S. South, the Southern vowel shift (SVS) and African American vowel shift (AAVS) affect not only vowels' relative positions but also their formant dynamics. Static characterizations of shifting, with a single pair of first and second formant values taken near vowels' midpoint, fail to capture this vowel-inherent spectral change, which can indicate dialect-specific diphthongization or monophthongization. Vowel-inherent spectral change is directly modeled to investigate how trajectories of front vowels /i eɪ ɪ ɛ/ differ across social groups in the 64-speaker Digital Archive of Southern Speech. Generalized additive mixed models are used to test the effects of two social factors, sex and ethnicity, on trajectory shape. All vowels studied show significant differences between men, women, African American and European American speakers. Results show strong overlap between the trajectories of /eɪ, ɛ/ particularly among European American women, consistent with the SVS, and greater vowel-inherent raising of /ɪ/ among African American speakers, indicating how that lax vowel is affected by the AAVS. Model predictions of duration additionally indicate that across groups, trajectories become more peripheral as vowel duration increases.

RevDate: 2020-02-02

Chung H (2020)

Vowel acoustic characteristics of Southern American English variation in Louisiana.

The Journal of the Acoustical Society of America, 147(1):541.

This study examined acoustic characteristics of vowels produced by speakers from Louisiana, one of the states in the Southern English dialect region. First, how Louisiana vowels differ from or are similar to the reported patterns of Southern dialect were examined. Then, within-dialect differences across regions in Louisiana were examined. Thirty-four female adult monolingual speakers of American English from Louisiana, ranging in age from 18 to 23, produced English monosyllabic words containing 11 vowels /i, ɪ, e, ɛ, æ, ʌ, u, ʊ, o, ɔ, ɑ/. The first two formant frequencies at the midpoint of the vowel nucleus, direction, and amount of formant changes across three different time points (20, 50, and 80%), and vowel duration were compared to previously reported data on Southern vowels. Overall, Louisiana vowels showed patterns consistent with previously reported characteristics of Southern vowels that reflect ongoing changes in the Southern dialect (no evidence of acoustic reversal of tense-lax pairs, more specifically no peripheralization of front vowels). Some dialect-specific patterns were also observed (a relatively lesser degree of formant changes and slightly shorter vowel duration). These patterns were consistent across different regions within Louisiana.

RevDate: 2020-01-20

Maebayashi H, Takiguchi T, S Takada (2019)

Study on the Language Formation Process of Very-Low-Birth-Weight Infants in Infancy Using a Formant Analysis.

The Kobe journal of medical sciences, 65(2):E59-E70.

Expressive language development depends on anatomical factors, such as motor control of the tongue and oral cavity needed for vocalization, as well as cognitive aspects for comprehension and speech. The purpose of this study was to examine the differences in expressive language development between normal-birth-weight (NBW) infants and very-low-birth-weight (VLBW) infants in infancy using a formant analysis. We also examined the presence of differences between infants with a normal development and those with a high risk of autism spectrum disorder who were expected to exist among VLBW infants. The participants were 10 NBW infants and 10 VLBW infants 12-15 months of age whose speech had been recorded at intervals of approximately once every 3 months. The recorded speech signal was analyzed using a formant analysis, and changes due to age were observed. One NBW and 3 VLBW infants failed to pass the screening tests (CBCL and M-CHAT) at 24 months of age. The formant frequencies (F1 and F2) of the three groups of infants (NBW, VLBW and CBCL·M-CHAT non-passing infants) were scatter-plotted by age. For the NBW and VLBW infants, the area of the plot increased with age, but there was no significant expansion of the plot area for the CBCL·M-CHAT non-passing infants. The results showed no significant differences in expressive language development between NBW infants at 24 months old and VLBW infants at the corrected age. However, different language developmental patterns were observed in CBCL·M-CHAT non-passing infants, regardless of birth weight, suggesting the importance of screening by acoustic analyses.

RevDate: 2020-01-16

Hosbach-Cannon CJ, Lowell SY, Colton RH, et al (2020)

Assessment of Tongue Position and Laryngeal Height in Two Professional Voice Populations.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose To advance our current knowledge of singer physiology by using ultrasonography in combination with acoustic measures to compare physiological differences between musical theater (MT) and opera (OP) singers under controlled phonation conditions. Primary objectives addressed in this study were (a) to determine if differences in hyolaryngeal and vocal fold contact dynamics occur between two professional voice populations (MT and OP) during singing tasks and (b) to determine if differences occur between MT and OP singers in oral configuration and associated acoustic resonance during singing tasks. Method Twenty-one singers (10 MT and 11 OP) were included. All participants were currently enrolled in a music program. Experimental procedures consisted of sustained phonation on the vowels /i/ and /ɑ/ during both a low-pitch task and a high-pitch task. Measures of hyolaryngeal elevation, tongue height, and tongue advancement were assessed using ultrasonography. Vocal fold contact dynamics were measured using electroglottography. Simultaneous acoustic recordings were obtained during all ultrasonography procedures for analysis of the first two formant frequencies. Results Significant oral configuration differences, reflected by measures of tongue height and tongue advancement, were seen between groups. Measures of acoustic resonance also showed significant differences between groups during specific tasks. Both singer groups significantly raised their hyoid position when singing high-pitched vowels, but hyoid elevation was not statistically different between groups. Likewise, vocal fold contact dynamics did not significantly differentiate the two singer groups. Conclusions These findings suggest that, under controlled phonation conditions, MT singers alter their oral configuration and achieve differing resultant formants as compared with OP singers. Because singers are at a high risk of developing a voice disorder, understanding how these two groups of singers adjust their vocal tract configuration during their specific singing genre may help to identify risky vocal behavior and provide a basis for prevention of voice disorders.

RevDate: 2020-01-15

Seyfarth RM, Cheney DL, Harcourt AH, et al (1994)

The acoustic features of gorilla double grunts and their relation to behavior.

American journal of primatology, 33(1):31-50.

Mountain gorillas (Gorilla gorilla beringei) give double-grunts to one another in a variety of situations, when feeding, resting, moving, or engaged in other kinds of social behavior. Some double-grunts elicit double-grunts in reply whereas others do not. Double-grunts are individually distinctive, and high-ranking animals give double-grunts at higher rates than others. There was no evidence, however, that the probability of eliciting a reply depended upon either the animals' behavior at the time a call was given or the social relationship between caller and respondent. The probability of eliciting a reply could be predicted from a double-grunt's acoustic features. Gorillas apparently produce at least two acoustically different subtypes of double-grunts, each of which conveys different information. Double-grunts with a low second formant (typically < 1600 Hz) are given by animals after a period of silence and frequently elicit vocal replies. Double-grunts with a high second formant (typically > 1600 Hz) are given by animals within 5 s of a call from another individual and rarely elicit replies. © 1994 Wiley-Liss, Inc.

RevDate: 2020-01-15

Souza P, Gallun F, R Wright (2020)

Contributions to Speech-Cue Weighting in Older Adults With Impaired Hearing.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose In a previous paper (Souza, Wright, Blackburn, Tatman, & Gallun, 2015), we explored the extent to which individuals with sensorineural hearing loss used different cues for speech identification when multiple cues were available. Specifically, some listeners placed the greatest weight on spectral cues (spectral shape and/or formant transition), whereas others relied on the temporal envelope. In the current study, we aimed to determine whether listeners who relied on temporal envelope did so because they were unable to discriminate the formant information at a level sufficient to use it for identification and the extent to which a brief discrimination test could predict cue weighting patterns. Method Participants were 30 older adults with bilateral sensorineural hearing loss. The first task was to label synthetic speech tokens based on the combined percept of temporal envelope rise time and formant transitions. An individual profile was derived from linear discriminant analysis of the identification responses. The second task was to discriminate differences in either temporal envelope rise time or formant transitions. The third task was to discriminate spectrotemporal modulation in a nonspeech stimulus. Results All listeners were able to discriminate temporal envelope rise time at levels sufficient for the identification task. There was wide variability in the ability to discriminate formant transitions, and that ability predicted approximately one third of the variance in the identification task. There was no relationship between performance in the identification task and either amount of hearing loss or ability to discriminate nonspeech spectrotemporal modulation. Conclusions The data suggest that listeners who rely to a greater extent on temporal cues lack the ability to discriminate fine-grained spectral information. The fact that the amount of hearing loss was not associated with the cue profile underscores the need to characterize individual abilities in a more nuanced way than can be captured by the pure-tone audiogram.

RevDate: 2020-01-17

Kamiloğlu RG, Fischer AH, DA Sauter (2020)

Good vibrations: A review of vocal expressions of positive emotions.

Psychonomic bulletin & review pii:10.3758/s13423-019-01701-x [Epub ahead of print].

Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.

RevDate: 2020-01-08

Dubey AK, Prasanna SRM, S Dandapat (2019)

Detection and assessment of hypernasality in repaired cleft palate speech using vocal tract and residual features.

The Journal of the Acoustical Society of America, 146(6):4211.

The presence of hypernasality in repaired cleft palate (CP) speech is a consequence of velopharyngeal insufficiency. The coupling of the nasal tract with the oral tract adds nasal formant and antiformant pairs in the hypernasal speech spectrum. This addition deviates the spectral and linear prediction (LP) residual characteristics of hypernasal speech compared to normal speech. In this work, the vocal tract constriction feature, peak to side-lobe ratio feature, and spectral moment features augmented by low-order cepstral coefficients are used to capture the spectral and residual deviations for hypernasality detection. The first feature captures the lower-frequencies prominence in speech due to the presence of nasal formants, the second feature captures the undesirable signal components in the residual signal due to the nasal antiformants, and the third feature captures the information about formants and antiformants in the spectrum along with the spectral envelope. The combination of three features gives normal versus hypernasal speech detection accuracies of 87.76%, 91.13%, and 93.70% for /a/, /i/, and /u/ vowels, respectively, and hypernasality severity detection accuracies of 80.13% and 81.25% for /i/ and /u/ vowels, respectively. The speech data are collected from 30 control normal and 30 repaired CP children between the ages of 7 and 12.

RevDate: 2019-12-31

Shiraishi M, Mishima K, H Umeda (2019)

Development of an Acoustic Simulation Method during Phonation of the Japanese Vowel /a/ by the Boundary Element Method.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30445-X [Epub ahead of print].

OBJECTIVES: The purpose of the present study was to establish the method for an acoustic simulation of a vocal tract created from CT data during phonation of the Japanese vowel /a/ and to verify the validity of the simulation.

MATERIAL AND METHODS: The subjects were 15 healthy adults (8 males, 7 females). The vocal tract model was created from CT data acquired during sustained phonation of the Japanese vowel /a/. After conversion to a mesh model for analysis, a wave acoustic analysis was performed with a boundary element method. The wall and the bottom of the vocal tract model were regarded as a rigid wall and a nonrigid wall, respectively. The acoustic medium was set to 37°C, and a point sound source was set in the place corresponding to the vocal cord as a sound source. The first and second formant frequencies (F1 and F2) were calculated. For 1 of the 15 subjects, the range from the upper end of the frontal sinus to the tracheal bifurcation was scanned, and 2 models were created: model 1 included the range from the frontal sinus to the tracheal bifurcation; and model 2 included the range from the frontal sinus to the glottis and added a virtually extended trachea by 12 cm cylindrically. F1 and F2 calculated from models 1 and 2 were compared. To evaluate the validity of the present simulation, F1 and F2 calculated from the simulation were compared with those of the actual voice and the sound generated using a solid model and a whistle-type artificial larynx. To judge the validity, the vowel formant frequency discrimination threshold reported in the past was used as a criterion. Namely, the relative discrimination thresholds (%), dividing ▵F by F, where F was the formant frequency calculated from the simulation, and ▵F was the difference between F and the formant frequency of the actual voice and the sound generated using the solid model and artificial larynx, were obtained.

RESULTS: F1 and F2 calculated from models 1 and 2 were similar. Therefore, to reduce the exposure dose, the remaining 14 subjects were scanned from the upper end of the frontal sinus to the glottis, and model 2 with the trachea extended by 12 cm virtually was used for the simulation. The averages of the relative discrimination thresholds against F1 and F2 calculated from the actual voice were 5.9% and 4.6%, respectively. The averages of the relative discrimination thresholds against F1 and F2 calculated from the sound generated by using the solid model and the artificial larynx were 4.1% and 3.7%, respectively.

CONCLUSIONS: The Japanese vowel /a/ could be simulated with high validity for the vocal tract models created from the CT data during phonation of /a/ using the boundary element method.

RevDate: 2019-12-31

Huang MY, Duan RY, Q Zhao (2019)

The influence of long-term cadmium exposure on the male advertisement call of Xenopus laevis.

Environmental science and pollution research international pii:10.1007/s11356-019-07525-5 [Epub ahead of print].

Cadmium (Cd) is a non-essential environmental endocrine-disrupting compound found in water and a potential threat to aquatic habitats. Cd has been shown to have various short-term effects on aquatic animals; however, evidence for long-term effects of Cd on vocal communications in amphibians is lacking. To better understand the long-term effects of low-dose Cd on acoustic communication in amphibians, male Xenopus laevis individuals were treated with low Cd concentrations (0.1, 1, and 10 μg/L) via aqueous exposure for 24 months. At the end of the exposure, the acoustic spectrum characteristics of male advertisement calls and male movement behaviors in response to female calls were recorded. The gene and protein expressions of the androgen receptor (AR) were determined using Western blot and RT-PCR. The results showed that long-term Cd treatment affected the spectrogram and formant of the advertisement call. Compared with the control group, 10 μg/L Cd significantly decreased the first and second formant frequency, and the fundamental and main frequency, and increased the third formant frequency. One and 10-μg/L Cd treatments significantly reduced the proportion of individuals responding to female calls and prolonged the time of first movement of the male. Long-term Cd treatment induced a downregulation in the AR protein. Treatments of 0.1, 1, and 10 μg/L Cd significantly decreased the expression of AR mRNA in the brain. These findings indicate that long-term exposure of Cd has negative effects on advertisement calls in male X. laevis.

RevDate: 2020-01-13

Park EJ, Yoo SD, Kim HS, et al (2019)

Correlations between swallowing function and acoustic vowel space in stroke patients with dysarthria.

NeuroRehabilitation, 45(4):463-469.

BACKGROUND: Dysphagia and dysarthria tend to coexist in stroke patients. Dysphagia can reduce patients' quality of life, cause aspiration pneumonia and increased mortality.

OBJECTIVE: To evaluate correlations among swallowing function parameters and acoustic vowel space values in patients with stroke.

METHODS: Data from stroke patients with dysarthria and dysphagia were collected. The formant parameter representing the resonance frequency of the vocal tract as a two-dimensional coordinate point was measured for the /a/, /ae/, /i/, and /u/vowels, and the quadrilateral vowel space area (VSA) and formant centralization ratio (FCR) were measured. Swallowing function was evaluated by a videofluoroscopic swallowing study (VFSS) using the videofluoroscopic dysphagia scale (VDS) and penetration aspiration scale (PAS). Pearson's correlation and linear regression analyses were used to assess the correlation of VSA and FCR to VDS and PAS scores.

RESULTS: Thirty-one stroke patients with dysphagia and dysarthria were analyzed. VSA showed a negative correlation to VDS and PAS scores, while FCR showed a positive correlation to VDS score, but not to PAS score. VSA and FCR were significant factors for assessing dysphagia severity.

CONCLUSIONS: VSA and FCR values were correlated with swallowing function and may be helpful in predicting dysphagia severity associated with stroke.

RevDate: 2020-01-08

McCarthy KM, Skoruppa K, P Iverson (2019)

Development of neural perceptual vowel spaces during the first year of life.

Scientific reports, 9(1):19592.

This study measured infants' neural responses for spectral changes between all pairs of a set of English vowels. In contrast to previous methods that only allow for the assessment of a few phonetic contrasts, we present a new method that allows us to assess changes in spectral sensitivity across the entire vowel space and create two-dimensional perceptual maps of the infants' vowel development. Infants aged four to eleven months were played long series of concatenated vowels, and the neural response to each vowel change was assessed using the Acoustic Change Complex (ACC) from EEG recordings. The results demonstrated that the youngest infants' responses more closely reflected the acoustic differences between the vowel pairs and reflected higher weight to first-formant variation. Older infants had less acoustically driven responses that seemed a result of selective increases in sensitivity for phonetically similar vowels. The results suggest that phonetic development may involve a perceptual warping for confusable vowels rather than uniform learning, as well as an overall increasing sensitivity to higher-frequency acoustic information.

RevDate: 2019-12-18

Houle N, SV Levi (2019)

Effect of Phonation on Perception of Femininity/Masculinity in Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30302-9 [Epub ahead of print].

Many transwomen seek voice and communication therapy to support their transition from their gender assigned at birth to their gender identity. This has led to an increased need to examine the perception of gender and femininity/masculinity to develop evidence-based intervention practices. In this study, we explore the auditory perception of femininity/masculinity in normally phonated and whispered speech. Transwomen, ciswomen, and cismen were recorded producing /hVd/ words. Naïve listeners rated femininity/masculinity of a speaker's voice using a visual analog scale, rather than completing a binary gender identification task. The results revealed that listeners rated speakers more ambiguously in whispered speech than normally phonated speech. An analysis of speaker and token characteristics revealed that in the normally phonated condition listeners consistently use f0 to rate femininity/masculinity. In addition, some evidence was found for possible contributions of formant frequencies, particularly F2, and duration. Taken together, this provides additional evidence for the salience of f0 and F2 for voice and communication intervention among transwomen.

RevDate: 2019-12-18

Xu Y, S Prom-On (2019)

Economy of Effort or Maximum Rate of Information? Exploring Basic Principles of Articulatory Dynamics.

Frontiers in psychology, 10:2469.

Economy of effort, a popular notion in contemporary speech research, predicts that dynamic extremes such as the maximum speed of articulatory movement are avoided as much as possible and that approaching the dynamic extremes is necessary only when there is a need to enhance linguistic contrast, as in the case of stress or clear speech. Empirical data, however, do not always support these predictions. In the present study, we considered an alternative principle: maximum rate of information, which assumes that speech dynamics are ultimately driven by the pressure to transmit information as quickly and accurately as possible. For empirical data, we asked speakers of American English to produce repetitive syllable sequences such as wawawawawa as fast as possible by imitating recordings of the same sequences that had been artificially accelerated and to produce meaningful sentences containing the same syllables at normal and fast speaking rates. Analysis of formant trajectories shows that dynamic extremes in meaningful speech sometimes even exceeded those in the nonsense syllable sequences but that this happened more often in unstressed syllables than in stressed syllables. We then used a target approximation model based on a mass-spring system of varying orders to simulate the formant kinematics. The results show that the kind of formant kinematics found in the present study and in previous studies can only be generated by a dynamical system operating with maximal muscular force under strong time pressure and that the dynamics of this operation may hold the solution to the long-standing enigma of greater stiffness in unstressed than in stressed syllables. We conclude, therefore, that maximum rate of information can coherently explain both current and previous empirical data and could therefore be a fundamental principle of motor control in speech production.

RevDate: 2020-01-01
CmpDate: 2019-12-12

Root-Gutteridge H, Ratcliffe VF, Korzeniowska AT, et al (2019)

Dogs perceive and spontaneously normalize formant-related speaker and vowel differences in human speech sounds.

Biology letters, 15(12):20190555.

Domesticated animals have been shown to recognize basic phonemic information from human speech sounds and to recognize familiar speakers from their voices. However, whether animals can spontaneously identify words across unfamiliar speakers (speaker normalization) or spontaneously discriminate between unfamiliar speakers across words remains to be investigated. Here, we assessed these abilities in domestic dogs using the habituation-dishabituation paradigm. We found that while dogs habituated to the presentation of a series of different short words from the same unfamiliar speaker, they significantly dishabituated to the presentation of a novel word from a new speaker of the same gender. This suggests that dogs spontaneously categorized the initial speaker across different words. Conversely, dogs who habituated to the same short word produced by different speakers of the same gender significantly dishabituated to a novel word, suggesting that they had spontaneously categorized the word across different speakers. Our results indicate that the ability to spontaneously recognize both the same phonemes across different speakers, and cues to identity across speech utterances from unfamiliar speakers, is present in domestic dogs and thus not a uniquely human trait.

RevDate: 2020-01-08

Vorperian HK, Kent RD, Lee Y, et al (2019)

Corner vowels in males and females ages 4 to 20 years: Fundamental and F1-F4 formant frequencies.

The Journal of the Acoustical Society of America, 146(5):3255.

The purpose of this study was to determine the developmental trajectory of the four corner vowels' fundamental frequency (fo) and the first four formant frequencies (F1-F4), and to assess when speaker-sex differences emerge. Five words per vowel, two of which were produced twice, were analyzed for fo and estimates of the first four formants frequencies from 190 (97 female, 93 male) typically developing speakers ages 4-20 years old. Findings revealed developmental trajectories with decreasing values of fo and formant frequencies. Sex differences in fo emerged at age 7. The decrease of fo was larger in males than females with a marked drop during puberty. Sex differences in formant frequencies appeared at the earliest age under study and varied with vowel and formant. Generally, the higher formants (F3-F4) were sensitive to sex differences. Inter- and intra-speaker variability declined with age but had somewhat different patterns, likely reflective of maturing motor control that interacts with the changing anatomy. This study reports a source of developmental normative data on fo and the first four formants in both sexes. The different developmental patterns in the first four formants and vowel-formant interactions in sex differences likely point to anatomic factors, although speech-learning phenomena cannot be discounted.

RevDate: 2020-01-12

Gianakas SP, MB Winn (2019)

Lexical bias in word recognition by cochlear implant listeners.

The Journal of the Acoustical Society of America, 146(5):3373.

When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.

RevDate: 2020-01-08

Perrachione TK, Furbeck KT, EJ Thurston (2019)

Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices.

The Journal of the Acoustical Society of America, 146(5):3384.

The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.

RevDate: 2019-12-02

Lo JJH (2019)

Between Äh(m) and Euh(m): The Distribution and Realization of Filled Pauses in the Speech of German-French Simultaneous Bilinguals.

Language and speech [Epub ahead of print].

Filled pauses are well known for their speaker specificity, yet cross-linguistic research has also shown language-specific trends in their distribution and phonetic quality. To examine the extent to which speakers acquire filled pauses as language- or speaker-specific phenomena, this study investigates the use of filled pauses in the context of adult simultaneous bilinguals. Making use of both distributional and acoustic data, this study analyzed UH, consisting of only a vowel component, and UM, with a vowel followed by [m], in the speech of 15 female speakers who were simultaneously bilingual in French and German. Speakers were found to use UM more frequently in German than in French, but only German-dominant speakers had a preference for UM in German. Formant and durational analyses showed that while speakers maintained distinct vowel qualities in their filled pauses in different languages, filled pauses in their weaker language exhibited a shift towards those in their dominant language. These results suggest that, despite high levels of variability between speakers, there is a significant role for language in the acquisition of filled pauses in simultaneous bilingual speakers, which is further shaped by the linguistic environment they grow up in.

RevDate: 2020-01-08

Frey R, Volodin IA, Volodina EV, et al (2019)

Savannah roars: The vocal anatomy and the impressive rutting calls of male impala (Aepyceros melampus) - highlighting the acoustic correlates of a mobile larynx.

Journal of anatomy [Epub ahead of print].

A retractable larynx and adaptations of the vocal folds in the males of several polygynous ruminants serve for the production of rutting calls that acoustically announce larger than actual body size to both rival males and potential female mates. Here, such features of the vocal tract and of the sound source are documented in another species. We investigated the vocal anatomy and laryngeal mobility including its acoustical effects during the rutting vocal display of free-ranging male impala (Aepyceros melampus melampus) in Namibia. Male impala produced bouts of rutting calls (consisting of oral roars and interspersed explosive nasal snorts) in a low-stretch posture while guarding a rutting territory or harem. For the duration of the roars, male impala retracted the larynx from its high resting position to a low mid-neck position involving an extensible pharynx and a resilient connection between the hyoid apparatus and the larynx. Maximal larynx retraction was 108 mm based on estimates in video single frames. This was in good concordance with 91-mm vocal tract elongation calculated on the basis of differences in formant dispersion between roar portions produced with the larynx still ascended and those produced with maximally retracted larynx. Judged by their morphological traits, the larynx-retracting muscles of male impala are homologous to those of other larynx-retracting ruminants. In contrast, the large and massive vocal keels are evolutionary novelties arising by fusion and linear arrangement of the arytenoid cartilage and the canonical vocal fold. These bulky and histologically complex vocal keels produced a low fundamental frequency of 50 Hz. Impala is another ruminant species in which the males are capable of larynx retraction. In addition, male impala vocal folds are spectacularly specialized compared with domestic bovids, allowing the production of impressive, low-frequency roaring vocalizations as a significant part of their rutting behaviour. Our study expands knowledge on the evolutionary variation of vocal fold morphology in mammals, suggesting that the structure of the mammalian sound source is not always human-like and should be considered in acoustic analysis and modelling.

RevDate: 2019-11-23

Hu G, Determan SC, Dong Y, et al (2019)

Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise.

Journal of the Association for Research in Otolaryngology : JARO pii:10.1007/s10162-019-00737-z [Epub ahead of print].

Acoustic features of speech include various spectral and temporal cues. It is known that temporal envelope plays a critical role for speech recognition by human listeners, while automated speech recognition (ASR) heavily relies on spectral analysis. This study compared sentence-recognition scores of humans and an ASR software, Dragon, when spectral and temporal-envelope cues were manipulated in background noise. Temporal fine structure of meaningful sentences was reduced by noise or tone vocoders. Three types of background noise were introduced: a white noise, a time-reversed multi-talker noise, and a fake-formant noise. Spectral information was manipulated by changing the number of frequency channels. With a 20-dB signal-to-noise ratio (SNR) and four vocoding channels, white noise had a stronger disruptive effect than the fake-formant noise. The same observation with 22 channels was made when SNR was lowered to 0 dB. In contrast, ASR was unable to function with four vocoding channels even with a 20-dB SNR. Its performance was least affected by white noise and most affected by the fake-formant noise. Increasing the number of channels, which improved the spectral resolution, generated non-monotonic behaviors for the ASR with white noise but not with colored noise. The ASR also showed highly improved performance with tone vocoders. It is possible that fake-formant noise affected the software's performance by disrupting spectral cues, whereas white noise affected performance by compromising speech segmentation. Overall, these results suggest that human listeners and ASR utilize different listening strategies in noise.

RevDate: 2019-12-18

Hu W, Tao S, Li M, et al (2019)

Distinctiveness and Assimilation in Vowel Perception in a Second Language.

Journal of speech, language, and hearing research : JSLHR, 62(12):4534-4543.

Purpose The purpose of this study was to investigate how the distinctive establishment of 2nd language (L2) vowel categories (e.g., how distinctively an L2 vowel is established from nearby L2 vowels and from the native language counterpart in the 1st formant [F1] × 2nd formant [F2] vowel space) affected L2 vowel perception. Method Identification of 12 natural English monophthongs, and categorization and rating of synthetic English vowels /i/ and /ɪ/ in the F1 × F2 space were measured for Chinese-native (CN) and English-native (EN) listeners. CN listeners were also examined with categorization and rating of Chinese vowels in the F1 × F2 space. Results As expected, EN listeners significantly outperformed CN listeners in English vowel identification. Whereas EN listeners showed distinctive establishment of 2 English vowels, CN listeners had multiple patterns of L2 vowel establishment: both, 1, or neither established. Moreover, CN listeners' English vowel perception was significantly related to the perceptual distance between the English vowel and its Chinese counterpart, and the perceptual distance between the adjacent English vowels. Conclusions L2 vowel perception relied on listeners' capacity to distinctively establish L2 vowel categories that were distant from the nearby L2 vowels.

RevDate: 2019-12-18

Mollaei F, Shiller DM, Baum SR, et al (2019)

The Relationship Between Speech Perceptual Discrimination and Speech Production in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR, 62(12):4256-4268.

Purpose We recently demonstrated that individuals with Parkinson's disease (PD) respond differentially to specific altered auditory feedback parameters during speech production. Participants with PD respond more robustly to pitch and less robustly to formant manipulations compared to control participants. In this study, we investigated whether differences in perceptual processing may in part underlie these compensatory differences in speech production. Methods Pitch and formant feedback manipulations were presented under 2 conditions: production and listening. In the production condition, 15 participants with PD and 15 age- and gender-matched healthy control participants judged whether their own speech output was manipulated in real time. During the listening task, participants judged whether paired tokens of their previously recorded speech samples were the same or different. Results Under listening, 1st formant manipulation discrimination was significantly reduced for the PD group compared to the control group. There was a trend toward better discrimination of pitch in the PD group, but the group difference was not significant. Under the production condition, the ability of participants with PD to identify pitch manipulations was greater than that of the controls. Conclusion The findings suggest perceptual processing differences associated with acoustic parameters of fundamental frequency and 1st formant perturbations in PD. These findings extend our previous results, indicating that different patterns of compensation to pitch and 1st formant shifts may reflect a combination of sensory and motor mechanisms that are differentially influenced by basal ganglia dysfunction.

RevDate: 2019-11-27

Escudero P, M Kalashnikova (2020)

Infants use phonetic detail in speech perception and word learning when detail is easy to perceive.

Journal of experimental child psychology, 190:104714.

Infants successfully discriminate speech sound contrasts that belong to their native language's phonemic inventory in auditory-only paradigms, but they encounter difficulties in distinguishing the same contrasts in the context of word learning. These difficulties are usually attributed to the fact that infants' attention to the phonetic detail in novel words is attenuated when they must allocate additional cognitive resources demanded by word-learning tasks. The current study investigated 15-month-old infants' ability to distinguish novel words that differ by a single vowel in an auditory discrimination paradigm (Experiment 1) and a word-learning paradigm (Experiment 2). These experiments aimed to tease apart whether infants' performance is dependent solely on the specific acoustic properties of the target vowels or on the context of the task. Experiment 1 showed that infants were able to discriminate only a contrast marked by a large difference along a static dimension (the vowels' second formant), whereas they were not able to discriminate a contrast with a small phonetic distance between its vowels, due to the dynamic nature of the vowels. In Experiment 2, infants did not succeed at learning words containing the same contrast they were able to discriminate in Experiment 1. The current findings demonstrate that both the specific acoustic properties of vowels in infants' native language and the task presented continue to play a significant role in early speech perception well into the second year of life.

RevDate: 2019-12-30

Rosenthal MA (2020)

A systematic review of the voice-tagging hypothesis of speech-in-noise perception.

Neuropsychologia, 136:107256.

The voice-tagging hypothesis claims that individuals who better represent pitch information in a speaker's voice, as measured with the frequency following response (FFR), will be better at speech-in-noise perception. The hypothesis has been provided to explain how music training might improve speech-in-noise perception. This paper reviews studies that are relevant to the voice-tagging hypothesis, including studies on musicians and nonmusicians. Most studies on musicians show greater f0 amplitude compared to controls. Most studies on nonmusicians do not show group differences in f0 amplitude. Across all studies reviewed, f0 amplitude does not consistently predict accuracy in speech-in-noise perception. The evidence suggests that music training does not improve speech-in-noise perception via enhanced subcortical representation of the f0.

RevDate: 2019-11-11

Hakanpää T, Waaramaa T, AM Laukkanen (2019)

Comparing Contemporary Commercial and Classical Styles: Emotion Expression in Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30209-7 [Epub ahead of print].

OBJECTIVE: This study examines the acoustic correlates of the vocal expression of emotions in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing.

STUDY DESIGN: This is an experimental comparative study.

METHODS: Eleven female singers with a minimum of 3 years of professional-level singing training in CCM, classical, or both styles participated. They sang the vowel [ɑ:] at three pitches (A3 220Hz, E4 330Hz, and A4 440Hz) expressing anger, sadness, joy, tenderness, and a neutral voice. Vowel samples were analyzed for fundamental frequency (fo) formant frequencies (F1-F5), sound pressure level (SPL), spectral structure (alpha ratio = SPL 1500-5000 Hz-SPL 50-1500 Hz), harmonics-to-noise ratio (HNR), perturbation (jitter, shimmer), onset and offset duration, sustain time, rate and extent of fo variation in vibrato, and rate and extent of amplitude vibrato.

RESULTS: The parameters that were statistically significantly (RM-ANOVA, P ≤ 0.05) related to emotion expression in both genres were SPL, alpha ratio, F1, and HNR. Additionally, for CCM, significance was found in sustain time, jitter, shimmer, F2, and F4. When fo and SPL were set as covariates in the variance analysis, jitter, HNR, and F4 did not show pure dependence on expression. The alpha ratio, F1, F2, shimmer apq5, amplitude vibrato rate, and sustain time of vocalizations had emotion-related variation also independent of fo and SPL in the CCM style, while these parameters were related to fo and SPL in the classical style.

CONCLUSIONS: The results differed somewhat for the CCM and classical styles. The alpha ratio showed less variation in the classical style, most likely reflecting the demand for a more stable voice source quality. The alpha ratio, F1, F2, shimmer, amplitude vibrato rate, and the sustain time of the vocalizations were related to fo and SPL control in the classical style. The only common independent sound parameter indicating emotional expression for both styles was SPL. The CCM style offers more freedom for expression-related changes in voice quality.

RevDate: 2019-11-22

Weirich M, A Simpson (2019)

Effects of Gender, Parental Role, and Time on Infant- and Adult-Directed Read and Spontaneous Speech.

Journal of speech, language, and hearing research : JSLHR, 62(11):4001-4014.

Purpose The study sets out to investigate inter- and intraspeaker variation in German infant-directed speech (IDS) and considers the potential impact that the factors gender, parental involvement, and speech material (read vs. spontaneous speech) may have. In addition, we analyze data from 3 time points prior to and after the birth of the child to examine potential changes in the features of IDS and, particularly also, of adult-directed speech (ADS). Here, the gender identity of a speaker is considered as an additional factor. Method IDS and ADS data from 34 participants (15 mothers, 19 fathers) is gathered by means of a reading and a picture description task. For IDS, 2 recordings were made when the baby was approximately 6 and 9 months old, respectively. For ADS, an additional recording was made before the baby was born. Phonetic analyses comprise mean fundamental frequency (f0), variation in f0, the 1st 2 formants measured in /i: ɛ a u:/, and the vowel space size. Moreover, social and behavioral data were gathered regarding parental involvement and gender identity. Results German IDS is characterized by an increase in mean f0, a larger variation in f0, vowel- and formant-specific differences, and a larger acoustic vowel space. No effect of gender or parental involvement was found. Also, the phonetic features of IDS were found in both spontaneous and read speech. Regarding ADS, changes in vowel space size in some of the fathers and in mean f0 in mothers were found. Conclusion Phonetic features of German IDS are robust with respect to the factors gender, parental involvement, speech material (read vs. spontaneous speech), and time. Some phonetic features of ADS changed within the child's first year depending on gender and parental involvement/gender identity. Thus, further research on IDS needs to address also potential changes in ADS.

RevDate: 2020-01-08

Howson PJ, MA Redford (2019)

Liquid coarticulation in child and adult speech.

Proceedings of the ... International Congress of Phonetic Sciences. International Congress of Phonetic Sciences, 2019:3100-3104.

Although liquids are mastered late, English-speaking children are said to have fully acquired these segments by age 8. The aim of this study was to test whether liquid coarticulation was also adult-like by this age. 8-year-old productions of /əLa/ and /əLu/ sequences were compared to 5-year-old and adult productions of these sequences. SSANOVA analyses of formant frequency trajectories indicated that, while adults contrasted rhotics and laterals from the onset of the vocalic sequence, F2 trajectories for rhotics and lateral were overlapped at the onset of the /əLa/ sequence in 8-year-old productions and across the entire /əLu/ sequence. The F2 trajectories for rhotics and laterals were even more overlapped in 5-year olds' productions. Overall, the study suggests that whereas younger children have difficulty coordinating the tongue body/root gesture with the tongue tip gesture, older children still struggle with the intergestural timing associated with liquid production.

RevDate: 2019-10-29

Kim D, S Kim (2019)

Coarticulatory vowel nasalization in American English: Data of individual differences in acoustic realization of vowel nasalization as a function of prosodic prominence and boundary.

Data in brief, 27:104593 pii:104593.

This article provides acoustic measurements data for vowel nasalization which are based on speech recorded from fifteen (8 female and 7 male) native speakers of American English in a laboratory setting. Each individual speaker's production patterns for the vowel nasalization in tautosyllabic CVN and NVC words are documented in terms of three acoustic parameters: the duration of nasal consonant (N-Duration), the duration of vowel (V-Duration) and the difference between the amplitude of the first formant (A1) and the first nasal peak (P0) obtained from the vowel (A1-P0) as an indication of the degree of vowel nasalization. The A1-P0 is measured at three different time points within the vowel -i.e., the near point (25%), midpoint (50%), and distant point (75%), either from the onset (CVN) or the offset (NVC) of the nasal consonant. These measures are taken from the target words in various prosodic prominence and boundary contexts: phonologically focused (PhonFOC) vs. lexically focused (LexFOC) vs. unfocused (NoFOC) conditions; phrase-edge (i.e., phrase-final for CVN and phrase-initial for NVC) vs. phrase-medial conditions. The data also contain a CSV file with each speaker's mean values of the N-Duration, V-Duration, and A1-P0 (z-scored) for each prosodic context along with the information about the speakers' gender. For further discussion of the data, please refer to the full-length article entitled "Prosodically-conditioned fine-tuning of coarticulatory vowel nasalization in English"(Cho et al., 2017).

RevDate: 2019-10-29

Goswami U, Nirmala SR, Vikram CM, et al (2019)

Analysis of Articulation Errors in Dysarthric Speech.

Journal of psycholinguistic research pii:10.1007/s10936-019-09676-5 [Epub ahead of print].

Imprecise articulation is the major issue reported in various types of dysarthria. Detection of articulation errors can help in diagnosis. The cues derived from both the burst and the formant transitions contribute to the discrimination of place of articulation of stops. It is believed that any acoustic deviations in stops due to articulation error can be analyzed by deriving features around the burst and the voicing onsets. The derived features can be used to discriminate the normal and dysarthric speech. In this work, a method is proposed to differentiate the voiceless stops produced by the normal speakers from the dysarthric by deriving the spectral moments, two-dimensional discrete cosine transform of linear prediction spectrum and Mel frequency cepstral coefficients features. These features and cosine distance based classifier is used for the classification of normal and dysarthic speech.

RevDate: 2020-01-02

Cartei V, Banerjee R, Garnham A, et al (2019)

Physiological and perceptual correlates of masculinity in children's voices.

Hormones and behavior, 117:104616 pii:S0018-506X(19)30277-6 [Epub ahead of print].

Low frequency components (i.e. a low pitch (F0) and low formant spacing (ΔF)) signal high salivary testosterone and height in adult male voices and are associated with high masculinity attributions by unfamiliar listeners (in both men and women). However, the relation between the physiological, acoustic and perceptual dimensions of speakers' masculinity prior to puberty remains unknown. In this study, 110 pre-pubertal children (58 girls), aged 3 to 10, were recorded as they described a cartoon picture. 315 adults (182 women) rated children's perceived masculinity from the voice only after listening to the speakers' audio recordings. On the basis of their voices alone, boys who had higher salivary testosterone levels were rated as more masculine and the relation between testosterone and perceived masculinity was partially mediated by F0. The voices of taller boys were also rated as more masculine, but the relation between height and perceived masculinity was not mediated by the considered acoustic parameters, indicating that acoustic cues other than F0 and ΔF may signal stature. Both boys and girls who had lower F0, were also rated as more masculine, while ΔF did not affect ratings. These findings highlight the interdependence of physiological, acoustic and perceptual dimensions, and suggest that inter-individual variation in male voices, particularly F0, may advertise hormonal masculinity from a very early age.

RevDate: 2019-10-17

Scheerer NE, Jacobson DS, JA Jones (2019)

Sensorimotor control of vocal production in early childhood.

Journal of experimental psychology. General pii:2019-62257-001 [Epub ahead of print].

Children maintain fluent speech despite dramatic changes to their articulators during development. Auditory feedback aids in the acquisition and maintenance of the sensorimotor mechanisms that underlie vocal motor control. MacDonald, Johnson, Forsythe, Plante, and Munhall (2012) reported that toddlers' speech motor control systems may "suppress" the influence of auditory feedback, since exposure to altered auditory feedback regarding their formant frequencies did not lead to modifications of their speech. This finding is not parsimonious with most theories of motor control. Here, we exposed toddlers to perturbations to the pitch of their auditory feedback as they vocalized. Toddlers compensated for the manipulations, producing significantly different responses to upward and downward perturbations. These data represent the first empirical demonstration that toddlers use auditory feedback for vocal motor control. Furthermore, our findings suggest toddlers are more sensitive to changes to the postural properties of their auditory feedback, such as fundamental frequency, relative to the phonemic properties, such as formant frequencies. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

RevDate: 2019-10-08

Conklin JT, O Dmitrieva (2019)

Vowel-to-Vowel Coarticulation in Spanish Nonwords.

Phonetica pii:000502890 [Epub ahead of print].

The present study examined vowel-to-vowel (VV) coarticulation in backness affecting mid vowels /e/ and /o/ in 36 Spanish nonwords produced by 20 native speakers of Spanish, aged 19-50 years (mean = 30.7; SD = 8.2). Examination of second formant frequency showed substantial carryover coarticulation throughout the data set, while anticipatory coarticulation was minimal and of shorter duration. Furthermore, the effect of stress on vowel-to-vowel coarticulation was investigated and found to vary by direction. In the anticipatory direction, small coarticulatory changes were relatively stable regardless of stress, particularly for target /e/, while in the carryover direction, a hierarchy of stress emerged wherein the greatest coarticulation occurred between stressed triggers and unstressed targets, less coarticulation was observed between unstressed triggers and unstressed targets, and the least coarticulation occurred between unstressed triggers with stressed targets. The results of the study augment and refine previously available knowledge about vowel-to-vowel coarticulation in Spanish and expand cross-linguistic understanding of the effect of stress on the magnitude and direction of vowel-to-vowel coarticulation.

RevDate: 2019-12-20

Lee Y, Keating P, J Kreiman (2019)

Acoustic voice variation within and between speakers.

The Journal of the Acoustical Society of America, 146(3):1568.

Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

RevDate: 2020-01-08

Balaguer M, Pommée T, Farinas J, et al (2020)

Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review.

Head & neck, 42(1):111-130.

BACKGROUND: The development of automatic tools based on acoustic analysis allows to overcome the limitations of perceptual assessment for patients with head and neck cancer. The aim of this study is to provide a systematic review of literature describing the effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis.

METHODS: Two databases (PubMed and Embase) were surveyed. The selection process, according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement, led to a final set of 22 articles.

RESULTS: Nasalance is studied mainly in oropharyngeal patients. The vowels are mostly studied using formant analysis and vowel space area, the consonants by means of spectral moments with specific parameters according to their phonetic characteristic. Machine learning methods allow classifying "intelligible" or "unintelligible" speech for T3 or T4 tumors.

CONCLUSIONS: The development of comprehensive models combining different acoustic measures would allow a better consideration of the functional impact of the speech disorder.

RevDate: 2019-09-23

Zeng Q, Jiao Y, Huang X, et al (2019)

Effects of Angle of Epiglottis on Aerodynamic and Acoustic Parameters in Excised Canine Larynges.

Journal of voice : official journal of the Voice Foundation, 33(5):627-633.

OBJECTIVES: The aim of this study is to explore the effects of the angle of epiglottis (Aepi) on phonation and resonance in excised canine larynges.

METHODS: The anatomic Aepi was measured for 14 excised canine larynges as a control. Then, the Aepis were manually adjusted to 60° and 90° in each larynx. Aerodynamic and acoustic parameters, including mean flow rate, sound pressure level, jitter, shimmer, fundamental frequency (F0), and formants (F1'-F4'), were measured with a subglottal pressure of 1.5 kPa. Simple linear regression analysis between acoustic and aerodynamic parameters and the Aepi of the control was performed, and an analysis of variance comparing the acoustic and aerodynamic parameters of the three treatments was carried out.

RESULTS: The results of the study are as follows: (1) the larynges with larger anatomic Aepi had significantly lower jitter, shimmer, formant 1, and formant 2; (2) phonation threshold flow was significantly different for the three treatments; and (3) mean flow rate and sound pressure level were significantly different between the 60° and the 90° treatments of the 14 larynges.

CONCLUSIONS: The Aepi was proposed for the first time in this study. The Aepi plays an important role in phonation and resonance of excised canine larynges.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Order from Amazon

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )