About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

30 Nov 2020 at 01:40
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 30 Nov 2020 at 01:40 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2020-11-04

Ishikawa K, J Webster (2020)

The Formant Bandwidth as a Measure of Vowel Intelligibility in Dysphonic Speech.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30403-3 [Epub ahead of print].

OBJECTIVE: The current paper examined the impact of dysphonia on the bandwidth of the first two formants of vowels, and the relationship between the formant bandwidth and vowel intelligibility.

METHODS: Speaker participants of the study were 10 adult females with healthy voice and 10 adult females with dysphonic voice. Eleven vowels in American English were recorded in /h/-vowel-/d/ format. The vowels were presented to 10 native speakers of American English with normal hearing, who were asked to select a vowel they heard from a list of /h/-vowel-/d/ words. The vowels were acoustically analyzed to measure the bandwidth of the first and second formants (B1 and B2). Separate Wilcoxon rank sum tests were conducted for each vowel for normal and dysphonic speech because the differences in B1 and B2 were found to not be normally distributed. Spearman correlation tests were conducted to evaluate the association between the difference in formant bandwidths and vowel intelligibility between the healthy and dysphonic speakers.

RESULTS: B1 was significantly greater in dysphonic vowels for seven of the eleven vowels, and lesser for only one of the vowels. There was no statistically significant difference in B2 between the normal and dysphonic vowels, except for the vowel /i/. The difference in B1 between normal and dysphonic vowels strongly predicted the intelligibility difference.

CONCLUSION: Dysphonia significantly affects B1, and the difference in B1 may serve as an acoustic marker for the intelligibility reduction in dysphonic vowels. This acoustic-perceptual relationship should be confirmed by a larger-scale study in the future.

RevDate: 2020-11-04

Burckardt ES, Hillman RE, Murton O, et al (2020)

The Impact of Tonsillectomy on the Adult Singing Voice: Acoustic and Aerodynamic Measures.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30373-8 [Epub ahead of print].

OBJECTIVE: Singers undergoing tonsillectomy are understandably concerned about possible sequelae to their voice. The surgical risks of laryngeal damage from intubation and upper airway scarring are valid reasons for singers to carefully consider their options for treatment of tonsil-related symptoms. No prior studies have statistically assessed objective voice outcomes in a group of adult singers undergoing tonsillectomy. This study determined the impact of tonsillectomy on the adult singing voice by determining if there were statistically significant changes in preoperative versus postoperative acoustic, aerodynamic, and Voice-Related Quality of Life (VRQOL) measures.

STUDY DESIGN: Prospective cohort study.

SETTING: Tertiary Referral Academic Hospital SUBJECTS: Thirty singers undergoing tonsillectomy from 2012 to 2019.

METHODS: Acoustic recordings were obtained with Computerized Speech Lab (CSL) (Pentax CSL 4500) and analyzed with the Multidimensional Voice Program (MDVP) (Pentax MDVP) and Pratt Acoustic Analysis Software. Estimates of aerodynamic vocal efficiency were obtained and analyzed using the Phonatory Aerodynamic System (Pentax PAS 6600). Preoperative VRQOL scores were recorded, and singers were instructed to refrain from singing for 3 weeks following tonsillectomy. Repeat acoustic and aerodynamic measures as well as VRQOL scores were obtained at the first postoperative visit.

RESULTS: Average postoperative acoustic (jitter, shimmer, HNR) and aerodynamic (sound pressure level divided by subglottal pressure) parameters related to laryngeal phonatory function did not differ significantly from preoperative measures. The only statistically significant change in postoperative measures of resonance was a decrease in the 3rd formant (F3) for the /a/ vowel. Average postoperative VRQOL scores (79.8, SD18.7) improved significantly from preoperative VRQOL scores (89, SD12.2) (P = 0.007).

CONCLUSIONS: Tonsillectomy does not appear to alter laryngeal voice production in adult singers as measured by standard acoustic and aerodynamic parameters. The observed decrease in F3 for the /a/ vowel is hypothetically related to increasing the pharyngeal cross-sectional area by removing tonsillar tissue, but this would not be expected to appreciably impact the perceptual characteristics of the vowel. Singers' self-assessment (VRQOL) improved after tonsillectomy.

RevDate: 2020-11-03

Roberts B, RJ Summers (2020)

Informational masking of speech depends on masker spectro-temporal variation but not on its coherence.

The Journal of the Acoustical Society of America, 148(4):2416.

The impact of an extraneous formant on intelligibility is affected by the extent (depth) of variation in its formant-frequency contour. Two experiments explored whether this impact also depends on masker spectro-temporal coherence, using a method ensuring that interference occurred only through informational masking. Targets were monaural three-formant analogues (F1+F2+F3) of natural sentences presented alone or accompanied by a contralateral competitor for F2 (F2C) that listeners must reject to optimize recognition. The standard F2C was created using the inverted F2 frequency contour and constant amplitude. Variants were derived by dividing F2C into abutting segments (100-200 ms, 10-ms rise/fall). Segments were presented either in the correct order (coherent) or in random order (incoherent), introducing abrupt discontinuities into the F2C frequency contour. F2C depth was also manipulated (0%, 50%, or 100%) prior to segmentation, and the frequency contour of each segment either remained time-varying or was set to constant at the geometric mean frequency of that segment. The extent to which F2C lowered keyword scores depended on segment type (frequency-varying vs constant) and depth, but not segment order. This outcome indicates that the impact on intelligibility depends critically on the overall amount of frequency variation in the competitor, but not its spectro-temporal coherence.

RevDate: 2020-11-03

Nenadić F, Coulter P, Nearey TM, et al (2020)

Perception of vowels with missing formant peaks.

The Journal of the Acoustical Society of America, 148(4):1911.

Although the first two or three formant frequencies are considered essential cues for vowel identification, certain limitations of this approach have been noted. Alternative explanations have suggested listeners rely on other aspects of the gross spectral shape. A study conducted by Ito, Tsuchida, and Yano [(2001). J. Acoust. Soc. Am. 110, 1141-1149] offered strong support for the latter, as attenuation of individual formant peaks left vowel identification largely unaffected. In the present study, these experiments are replicated in two dialects of English. Although the results were similar to those of Ito, Tsuchida, and Yano [(2001). J. Acoust. Soc. Am. 110, 1141-1149], quantitative analyses showed that when a formant is suppressed, participant response entropy increases due to increased listener uncertainty. In a subsequent experiment, using synthesized vowels with changing formant frequencies, suppressing individual formant peaks led to reliable changes in identification of certain vowels but not in others. These findings indicate that listeners can identify vowels with missing formant peaks. However, such formant-peak suppression may lead to decreased certainty in identification of steady-state vowels or even changes in vowel identification in certain dynamically specified vowels.

RevDate: 2020-11-02

Easwar V, Birstler J, Harrison A, et al (2020)

The Accuracy of Envelope Following Responses in Predicting Speech Audibility.

Ear and hearing, 41(6):1732-1746.

OBJECTIVES: The present study aimed to (1) evaluate the accuracy of envelope following responses (EFRs) in predicting speech audibility as a function of the statistical indicator used for objective response detection, stimulus phoneme, frequency, and level, and (2) quantify the minimum sensation level (SL; stimulus level above behavioral threshold) needed for detecting EFRs.

DESIGN: In 21 participants with normal hearing, EFRs were elicited by 8 band-limited phonemes in the male-spoken token /susa∫i/ (2.05 sec) presented between 20 and 65 dB SPL in 15 dB increments. Vowels in /susa∫i/ were modified to elicit two EFRs simultaneously by selectively lowering the fundamental frequency (f0) in the first formant (F1) region. The modified vowels elicited one EFR from the low-frequency F1 and another from the mid-frequency second and higher formants (F2+). Fricatives were amplitude-modulated at the average f0. EFRs were extracted from single-channel EEG recorded between the vertex (Cz) and the nape of the neck when /susa∫i/ was presented monaurally for 450 sweeps. The performance of the three statistical indicators, F-test, Hotelling's T, and phase coherence, was compared against behaviorally determined audibility (estimated SL, SL ≥0 dB = audible) using area under the receiver operating characteristics (AUROC) curve, sensitivity (the proportion of audible speech with a detectable EFR [true positive rate]), and specificity (the proportion of inaudible speech with an undetectable EFR [true negative rate]). The influence of stimulus phoneme, frequency, and level on the accuracy of EFRs in predicting speech audibility was assessed by comparing sensitivity, specificity, positive predictive value (PPV; the proportion of detected EFRs elicited by audible stimuli) and negative predictive value (NPV; the proportion of undetected EFRs elicited by inaudible stimuli). The minimum SL needed for detection was evaluated using a linear mixed-effects model with the predictor variables stimulus and EFR detection p value.

RESULTS: of the 3 statistical indicators were similar; however, at the type I error rate of 5%, the sensitivities of Hotelling's T (68.4%) and phase coherence (68.8%) were significantly higher than the F-test (59.5%). In contrast, the specificity of the F-test (97.3%) was significantly higher than the Hotelling's T (88.4%). When analyzed using Hotelling's T as a function of stimulus, fricatives offered higher sensitivity (88.6 to 90.6%) and NPV (57.9 to 76.0%) compared with most vowel stimuli (51.9 to 71.4% and 11.6 to 51.3%, respectively). When analyzed as a function of frequency band (F1, F2+, and fricatives aggregated as low-, mid- and high-frequencies, respectively), high-frequency stimuli offered the highest sensitivity (96.9%) and NPV (88.9%). When analyzed as a function of test level, sensitivity improved with increases in stimulus level (99.4% at 65 dB SPL). The minimum SL for EFR detection ranged between 13.4 and 21.7 dB for F1 stimuli, 7.8 to 12.2 dB for F2+ stimuli, and 2.3 to 3.9 dB for fricative stimuli.

CONCLUSIONS: EFR-based inference of speech audibility requires consideration of the statistical indicator used, phoneme, stimulus frequency, and stimulus level.

RevDate: 2020-10-31

Rakerd B, Hunter EJ, P Lapine (2019)

Resonance Effects and the Vocalization of Speech.

Perspectives of the ASHA special interest groups, 4(6):1637-1643.

Studies of the respiratory and laryngeal actions required for phonation are central to our understanding of both voice and voice disorders. The purpose of the present article is to highlight complementary insights about voice that have come from the study of vocal tract resonance effects.

RevDate: 2020-10-30

Jeanneteau M, Hanna N, Almeida A, et al (2020)

Using visual feedback to tune the second vocal tract resonance for singing in the high soprano range.

Logopedics, phoniatrics, vocology [Epub ahead of print].

PURPOSE: Over a range roughly C5-C6, sopranos usually tune their first vocal tract resonance (R1) to the fundamental frequency (fo) of the note sung: R1:fo tuning. Those who sing well above C6 usually adjust their second vocal tract resonance (R2) and use R2:fo tuning. This study investigated these questions: Can singers quickly learn R2:fo tuning when given suitable feedback? Can they subsequently use this tuning without feedback? And finally, if so, does this assist their singing in the high range?

METHODS: New computer software for the technique of resonance estimation by broadband excitation at the lips was used to provide real-time visual feedback on fo and vocal tract resonances. Eight sopranos participated. In a one-hour session, they practised adjusting R2 whilst miming (i.e. without phonating), and then during singing.

RESULTS: Six sopranos learned to tune R2 over a range of several semi-tones, when feedback was present. This achievement did not immediately extend their singing range. When the feedback was removed, two sopranos spontaneously used R2:fo tuning at the top of their range above C6.

CONCLUSIONS: With only one hour of training, singers can learn to adjust their vocal tract shape for R2:fo tuning when provided with visual feedback. One additional participant who spent considerable time with the software, acquired greater skill at R2:fo tuning and was able to extend her singing range. A simple version of the hardware used can be assembled using basic equipment and the software is available online.

RevDate: 2020-10-27

Ayres A, Winckler PB, Jacinto-Scudeiro LA, et al (2020)

Speech characteristics in individuals with myasthenia gravis: a case control study.

Logopedics, phoniatrics, vocology [Epub ahead of print].

INTRODUCTION: Myasthenia Gravis (MG) is an autoimmune disease. The characteristic symptoms of the disease are muscle weakness and fatigue. These symptoms affect de oral muscles causing dysarthria, affecting about 60% of patients with disease progression.

PURPOSE: Describe the speech pattern of patients with MG and comparing with healthy controls (HC).

MATERIAL AND METHODS: Case-control study. Participants were divided in MG group (MGG) with 38 patients MG diagnosed and HC with 18 individuals matched for age and sex. MGG was evaluated with clinical and motor scales and answered self-perceived questionnaires. Speech assessment of both groups included: recording of speech tasks, acoustic and auditory-perceptual analysis.

RESULTS: In the MGG, 68.24% of the patients were female, with average age of 50.21 years old (±16.47), 14.18 years (±9.52) of disease duration and a motor scale of 11.19 points (±8.79). The auditory-perceptual analysis verified that 47.36% (n = 18) participants in MGG presented mild dysarthria, 10.52% (n = 4) moderate dysarthria, with a high percentage of alterations in phonation (95.2%) and breathing (52.63%). The acoustic analysis verified a change in phonation, with significantly higher shimmer values in the MGG compared to the HC and articulation with a significant difference between the groups for the first formant of the /iu/ (p = <.001). No correlation was found between the diagnosis of speech disorder and the dysarthria self-perception questionnaire.

CONCLUSION: We found dysarthria mild in MG patients with changes in the motor bases phonation and breathing, with no correlation with severity and disease duration.

RevDate: 2020-10-22

Kim KS, Daliri A, Flanagan JR, et al (2020)

Dissociated development of speech and limb sensorimotor learning in stuttering: speech auditory-motor learning is impaired in both children and adults who stutter.

Neuroscience pii:S0306-4522(20)30665-5 [Epub ahead of print].

Stuttering is a neurodevelopmental disorder of speech fluency. Various experimental paradigms have demonstrated that affected individuals show limitations in sensorimotor control and learning. However, controversy exists regarding two core aspects of this perspective. First, it has been claimed that sensorimotor learning limitations are detectable only in adults who stutter (after years of coping with the disorder) but not during childhood close to the onset of stuttering. Second, it remains unclear whether stuttering individuals' sensorimotor learning limitations affect only speech movements or also unrelated effector systems involved in nonspeech movements. We report data from separate experiments investigating speech auditory-motor learning (N = 60) and limb visuomotor learning (N = 84) in both children and adults who stutter versus matched nonstuttering individuals. Both children and adults who stutter showed statistically significant limitations in speech auditory-motor adaptation with formant-shifted feedback. This limitation was more profound in children than in adults and in younger children versus older children. Between-group differences in the adaptation of reach movements performed with rotated visual feedback were subtle but statistically significant for adults. In children, even the nonstuttering groups showed limited visuomotor adaptation just like their stuttering peers. We conclude that sensorimotor learning is impaired in individuals who stutter, and that the ability for speech auditory-motor learning-which was already adult-like in 3-6 year-old typically developing children-is severely compromised in young children near the onset of stuttering. Thus, motor learning limitations may play an important role in the fundamental mechanisms contributing to the onset of this speech disorder.

RevDate: 2020-10-20

Lester-Smith RA, Daliri A, Enos N, et al (2020)

The Relation of Articulatory and Vocal Auditory-Motor Control in Typical Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose The purpose of this study was to explore the relationship between feedback and feedforward control of articulation and voice by measuring reflexive and adaptive responses to first formant (F1) and fundamental frequency (fo) perturbations. In addition, perception of F1 and fo perturbation was estimated using passive (listening) and active (speaking) just noticeable difference paradigms to assess the relation of auditory acuity to reflexive and adaptive responses. Method Twenty healthy women produced single words and sustained vowels while the F1 or fo of their auditory feedback was suddenly and unpredictably perturbed to assess reflexive responses or gradually and predictably perturbed to assess adaptive responses. Results Typical speakers' reflexive responses to sudden perturbation of F1 were related to their adaptive responses to gradual perturbation of F1. Specifically, speakers with larger reflexive responses to sudden perturbation of F1 had larger adaptive responses to gradual perturbation of F1. Furthermore, their reflexive responses to sudden perturbation of F1 were associated with their passive auditory acuity to F1 such that speakers with better auditory acuity to F1 produced larger reflexive responses to sudden perturbations of F1. Typical speakers' adaptive responses to gradual perturbation of F1 were not associated with their auditory acuity to F1. Speakers' reflexive and adaptive responses to perturbation of fo were not related, nor were their responses related to either measure of auditory acuity to fo. Conclusion These findings indicate that there may be disparate feedback and feedforward control mechanisms for articulatory and vocal error correction based on auditory feedback.

RevDate: 2020-10-18

Pawelec ŁP, Graja K, A Lipowicz (2020)

Vocal Indicators of Size, Shape and Body Composition in Polish Men.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30352-0 [Epub ahead of print].

OBJECTIVES: From a human evolution perspective, identifying a link between physique and vocal quality could demonstrate dual signaling in terms of the health and biological condition of an individual. In this regard, this study investigates the relationship between men's body size, shape, and composition, and their vocal characteristics.

MATERIALS AND METHODS: Eleven anthropometric measurements, using seven indices, were carried out with 80 adult Polish male participants, while the speech analysis adopted a voice recording procedure that involved phonetically recording vowels /ɑː/, /ɛː/, /iː/, /ɔː/, /uː/ to define the voice acoustic components used in Praat software.

RESULTS: The relationship between voice parameters and body size/shape/composition was found. The analysis indicated that the formants and their derivatives were useful parameters for prediction of height, weight, neck, shoulder, waist, and hip circumferences. Fundamental frequency (F0) was negatively correlated with neck circumference at Adam's apple level and body height. Moreover neck circumference and F0 association was observed for the first time in this paper. The association between waist circumference and formant component showed a net effect. In addition, the formant parameters showed significant correlations with body shape, indicating a lower vocal timbre in men with a larger relative waist circumference.

DISCUSSION: Men with lower vocal pitch had wider necks, probably a result of larynx size. Furthermore, a greater waist circumference, presumably resulting from abdominal fat distribution in men, correlated with a lower vocal timbre. While these results are inconclusive, they highlight new directions for further research.

RevDate: 2020-10-08

Auracher J, Menninghaus W, M Scharinger (2020)

Sound Predicts Meaning: Cross-Modal Associations Between Formant Frequency and Emotional Tone in Stanzas.

Cognitive science, 44(10):e12906.

Research on the relation between sound and meaning in language has reported substantial evidence for implicit associations between articulatory-acoustic characteristics of phonemes and emotions. In the present study, we specifically tested the relation between the acoustic properties of a text and its emotional tone as perceived by readers. To this end, we asked participants to assess the emotional tone of single stanzas extracted from a large variety of poems. The selected stanzas had either an extremely high, a neutral, or an extremely low average formant dispersion. To assess the average formant dispersion per stanza, all words were phonetically transcribed and the distance between the first and second formant per vowel was calculated. Building on a long tradition of research on associations between sound frequency on the one hand and non-acoustic concepts such as size, strength, or happiness on the other hand, we hypothesized that stanzas with an extremely high average formant dispersion would be rated lower on items referring to Potency (dominance) and higher on items referring to Activity (arousal) and Evaluation (emotional valence). The results confirmed our hypotheses for the dimensions of Potency and Evaluation, but not for the dimension of Activity. We conclude that, at least in poetic language, extreme values of acoustic features of vowels are a significant predictor for the emotional tone of a text.

RevDate: 2020-10-05

Song XY, Wang SJ, Xu ZX, et al (2020)

Preliminary study on phonetic characteristics of patients with pulmonary nodules.

Journal of integrative medicine pii:S2095-4964(20)30104-7 [Epub ahead of print].

OBJECTIVE: Pulmonary nodules (PNs) are one of the imaging manifestations of early lung cancer screening, which should receive more attention. Traditional Chinese medicine believes that voice changes occur in patients with pulmonary diseases. The purpose of this study is to explore the differences in phonetic characteristics between patients with PNs and able-bodied persons.

METHODS: This study explores the phonetic characteristics of patients with PNs in order to provide a simpler and cheaper method for PN screening. It is a case-control study to explore the differences in phonetic characteristics between individuals with and without PNs. This study performed non-parametric statistics on acoustic parameters of vocalizations, collected from January 2017 to March 2018 in Shanghai, China, from these two groups; it explores the differences in third and fourth acoustic parameters between patients with PNs and a normal control group. At the same time, computed tomography (CT) scans, course of disease, combined disease and other risk factors of the patients were collected in the form of questionnaire. According to the grouping of risk factors, the phonetic characteristics of the patients with PNs were analyzed.

RESULTS: This study was comprised of 200 patients with PNs, as confirmed by CT, and 86 healthy people that served as a control group. Among patients with PNs, 43% had ground glass opacity, 32% had nodules with a diameter ≥ 8 mm, 19% had a history of smoking and 31% had hyperlipidemia. Compared with the normal group, there were statistically significant differences in pitch, intensity and shimmer in patients with PNs. Among patients with PNs, patients with diameters ≥ 8 mm had a significantly higher third formant. There was a significant difference in intensity, fourth formant and harmonics-to-noise ratio (HNR) between smoking and non-smoking patients. Compared with non-hyperlipidemia patients, the pitch, jitter and shimmer of patients with PNs and hyperlipidemia were higher and the HNR was lower; these differences were statistically significant.

CONCLUSION: This measurable changes in vocalizations can be in patients with PNs. Patients with PNs had lower and weaker voices. The size of PNs had an effect on the phonetic formant. Smoking may contribute to damage to the voice and formant changes. Voice damage is more pronounced in individuals who have PNs accompanied by hyperlipidemia.

RevDate: 2020-10-03

Melton J, Bradford Z, J Lee (2020)

Acoustic Characteristics of Vocal Sounds Used by Professional Actors Performing Classical Material Without Microphones in Outdoor Theatre.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30336-2 [Epub ahead of print].

OBJECTIVE: Theatre actors use voice in virtually any physical position, moving or still, and perform in a wide range of venues. The present study investigated acoustic qualities required to perform classical material without electronic amplification in outdoor spaces.

DESIGN: Eight professional actors, four female, four male, from NY Classical Theatre performed one-minute monologues, first stationary, then moving, for audio recording in Central Park. Four subjects recorded two monologues each, from productions in which they played both male and female characters. Data were analyzed for fundamental frequency (F0), sound pressure level (SPL), and long-term average spectrum (LTAS).

RESULTS: Overall, F0 ranged between 75.38 and 530.33 Hz. Average F0 was 326 Hz stationary and 335.78 Hz moving for females, 248.54 Hz stationary, 252.82 Hz moving for males. SPL ranged from 28.54 to 110.51 dB for females, and 56.69 to 124.44 dB for males. Average SPL was 82 dB for females, 96.98 dB for males. On LTAS, females had a peak between 3 and 4 kHz ranging from 1.5 to 4.5 dB and another between 4 and 5 kHz ranging from 2 to 4.5 dB, while males had a peak between 3 and 4 kHz ranging from 1 to 8.5 dB.

CONCLUSION: Actors appear to use a similar F0 range across gender and performing conditions. Average F0 increased from stationary to moving. Males had greater SPL values than females, and the amplitude of peaks in the region of the Actor's Formant of LTAS curves was higher in male than female voices.

RevDate: 2020-10-02

Caverlé MWJ, AP Vogel (2020)

Stability, reliability, and sensitivity of acoustic measures of vowel space: A comparison of vowel space area, formant centralization ratio, and vowel articulation index.

The Journal of the Acoustical Society of America, 148(3):1436.

Vowel space (VS) measurements can provide objective information on formant distribution and act as a proxy for vowel production. There are a number of proposed ways to quantify vowel production clinically, including vowel space area, formant centralization ratio, and vowel articulation index (VAI). The stability, reliability, and sensitivity of three VS measurements were investigated in two experiments. Stability was explored across three inter-recording intervals and challenged in two sensitivity conditions. Data suggest that VAI is the most stable measure across 30 s, 2 h, and 4 h inter-recording intervals. VAI appears the most sensitive metric of the three measures in conditions of fatigue and noise. These analyses highlight the need for stability and sensitivity analysis when developing and validating acoustic metrics, and underscore the potential of the VAI for vowel analysis.

RevDate: 2020-10-02

Kaya Z, Soltanipour M, A Treves (2020)

Non-hexagonal neural dynamics in vowel space.

AIMS neuroscience, 7(3):275-298.

Are the grid cells discovered in rodents relevant to human cognition? Following up on two seminal studies by others, we aimed to check whether an approximate 6-fold, grid-like symmetry shows up in the cortical activity of humans who "navigate" between vowels, given that vowel space can be approximated with a continuous trapezoidal 2D manifold, spanned by the first and second formant frequencies. We created 30 vowel trajectories in the assumedly flat central portion of the trapezoid. Each of these trajectories had a duration of 240 milliseconds, with a steady start and end point on the perimeter of a "wheel". We hypothesized that if the neural representation of this "box" is similar to that of rodent grid units, there should be an at least partial hexagonal (6-fold) symmetry in the EEG response of participants who navigate it. We have not found any dominant n-fold symmetry, however, but instead, using PCAs, we find indications that the vowel representation may reflect phonetic features, as positioned on the vowel manifold. The suggestion, therefore, is that vowels are encoded in relation to their salient sensory-perceptual variables, and are not assigned to arbitrary grid-like abstract maps. Finally, we explored the relationship between the first PCA eigenvector and putative vowel attractors for native Italian speakers, who served as the subjects in our study.

RevDate: 2020-10-02

Moon IJ, Kang S, Boichenko N, et al (2020)

Meter enhances the subcortical processing of speech sounds at a strong beat.

Scientific reports, 10(1):15973.

The temporal structure of sound such as in music and speech increases the efficiency of auditory processing by providing listeners with a predictable context. Musical meter is a good example of a sound structure that is temporally organized in a hierarchical manner, with recent studies showing that meter optimizes neural processing, particularly for sounds located at a higher metrical position or strong beat. Whereas enhanced cortical auditory processing at times of high metric strength has been studied, there is to date no direct evidence showing metrical modulation of subcortical processing. In this work, we examined the effect of meter on the subcortical encoding of sounds by measuring human auditory frequency-following responses to speech presented at four different metrical positions. Results show that neural encoding of the fundamental frequency of the vowel was enhanced at the strong beat, and also that the neural consistency of the vowel was the highest at the strong beat. When comparing musicians to non-musicians, musicians were found, at the strong beat, to selectively enhance the behaviorally relevant component of the speech sound, namely the formant frequency of the transient part. Our findings indicate that the meter of sound influences subcortical processing, and this metrical modulation differs depending on musical expertise.

RevDate: 2020-10-02

Park EJ, Kim JH, Choi YH, et al (2020)

Association between phonation and the vowel quadrilateral in patients with stroke: A retrospective observational study.

Medicine, 99(39):e22236.

Articulation disorder is associated with impaired control of respiration and speech organ movement. There are many cases of dysarthria and dysphonia in stroke patients. Dysphonia adversely affects communication and social activities, and it can interfere with everyday life. The purpose of this study is to assess the association between phonation abilities and the vowel quadrilateral in stroke patients.The subjects were stroke patients with pronunciation and phonation disorders. The resonance frequency was measured for the 4 corner vowels to measure the vowel space area (VSA) and formant centralization ratio (FCR). Phonation ability was evaluated by the Dysphonia Severity Index (DSI) and maximal phonation time (MPT) through acoustic evaluation for each vowel. Pearsons correlation analysis was performed to confirm the association, and multiple linear regression analysis was performed between variables.The correlation coefficients of VSA and MPT/u/ were 0.420, VSA and MPT/i/ were 0.536, VSA and DSI/u/ were 0.392, VSA and DSI /i/ were 0.364, and FCR and DSI /i/ were -0.448. Multiple linear regression analysis showed that VSA was a factor significantly influencing MPT/u/ (β = 0.420, P = .021, R = 0.147), MPT/i/ (β = 0.536, P = .002, R = 0.262), DSI/u/ (β = 0.564, P = .045, R = 0.256), and DSI/i/ (β = 0.600, P = .03, R = 0.302).The vowel quadrilateral can be a useful tool for evaluating the phonation function of stroke patients.

RevDate: 2020-09-28

Ge S, Wan Q, Yin M, et al (2020)

Quantitative acoustic metrics of vowel production in mandarin-speakers with post-stroke spastic dysarthria.

Clinical linguistics & phonetics [Epub ahead of print].

Impairment of vowel production in dysarthria has been highly valued. This study aimed to explore the vowel production of Mandarin-speakers with post-stroke spastic dysarthria in connected speech and to explore the influence of gender and tone on the vowel production. Multiple vowel acoustic metrics, including F1 range, F2 range, vowel space area (VSA), vowel articulation index (VAI) and formant centralization ratio (FCR), were analyzed from vowel tokens embedded in connected speech produced. The participants included 25 clients with spastic dysarthria secondary to stroke (15 males, 10 females) and 25 speakers with no history of neurological disease (15 males, 10 females). Variance analyses were conducted and the results showed that the main effects of population, gender, and tone on F2 range, VSA, VAI, and FCR were all significant. Vowel production became centralized in the clients with post-stroke spastic dysarthria. Vowel production was found to be more centralized in males compared to females. Vowels in neutral tone (T0) were the most centralized among the other tones. The quantitative acoustic metrics of F2 range, VSA, VAI, and FCR were effective in predicting vowel production in Mandarin-speaking clients with post-stroke spastic dysarthria, and hence may be used as powerful tools to assess the speech performance for this population.

RevDate: 2020-09-25

Daliri A, Chao SC, LC Fitzgerald (2020)

Compensatory Responses to Formant Perturbations Proportionally Decrease as Perturbations Increase.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose We continuously monitor our speech output to detect potential errors in our productions. When we encounter errors, we rapidly change our speech output to compensate for the errors. However, it remains unclear whether we adjust the magnitude of our compensatory responses based on the characteristics of errors. Method Participants (N = 30 adults) produced monosyllabic words containing /ɛ/ (/hɛp/, /hɛd/, /hɛk/) while receiving perturbed or unperturbed auditory feedback. In the perturbed trials, we applied two different types of formant perturbations: (a) the F1 shift, in which the first formant of /ɛ/ was increased, and (b) the F1-F2 shift, in which the first formant was increased and the second formant was decreased to make a participant's /ɛ/ sound like his or her /æ/. In each perturbation condition, we applied three participant-specific perturbation magnitudes (0.5, 1.0, and 1.5 ɛ-æ distance). Results Compensatory responses to perturbations with the magnitude of 1.5 ɛ-æ were proportionally smaller than responses to perturbation magnitudes of 0.5 ɛ-æ. Responses to the F1-F2 shift were larger than responses to the F1 shift regardless of the perturbation magnitude. Additionally, compensatory responses for /hɛd/ were smaller than responses for /hɛp/ and /hɛk/. Conclusions Overall, these results suggest that the brain uses its error evaluation to determine the extent of compensatory responses. The brain may also consider categorical errors and phonemic environments (e.g., articulatory configurations of the following phoneme) to determine the magnitude of its compensatory responses to auditory errors.

RevDate: 2020-09-21

Nilsson T, Laukkanen AM, T Syrjä (2020)

Effects of Sixteen Month Voice Training of Student Actors Applying the Linklater Voice Method.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30301-5 [Epub ahead of print].

OBJECTIVE: This study investigates the perceptual and acoustic changes in student actors' voices after 16 months of Linklater Voice training, which is a holistic method to train actors' voices.

METHODS: Eleven (n = 11) actor students' text and Voice Range Profile (VRP) recordings were analyzed pretraining and 16 months posttraining. From text readings at comfortable performance loudness, both perceptual and acoustic analyses were made. Acoustic measures included sound pressure level (SPL), fundamental frequency (fo), and sound level differences between different frequency ranges derived from long-term-average spectrum. Sustained vowels [i:], [o], and [e] abstracted from the text sample were analyzed for formant frequencies F1-F4 and the frequency difference between F4 and F3. The VRP was registered to investigate SPL of the softest and loudest phonations throughout the voice range.

RESULTS: The perceived pitch range during text reading increased significantly. The acoustic result showed a strong trend toward decreasing in minimum fo, and increasing in maximum fo and fo range. The VRP showed a significant increase in the fo range and dynamics (SPL range). Perceived voice production showed a trend toward phonation balance (neither pressed-nor breathy) and darker voice color posttraining.

CONCLUSION: The perceptual and acoustic analysis of text reading and acoustic measures of VRP suggest that LV training has a positive impact on voice.

RevDate: 2020-09-18

Di Natale V, Cantarella G, Manfredi C, et al (2020)

Semioccluded Vocal Tract Exercises Improve Self-Perceived Voice Quality in Healthy Actors.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30273-3 [Epub ahead of print].

PURPOSE: Semi-occluded vocal tract exercises (SOVTE) have shown to lead to more effective and efficient vocal production for individuals with voice disorders and for singers. The aim of the present study is to investigate the effects of a 10-minute SOVTE warm-up protocol on the actors' voice.

METHODS: Twenty-seven professional theater actors (16 females) without voice complaints were audio-recorded while reading aloud, with their acting voice, a short dramatic passage at four time points. Recordings were made: the day before the show, just before and soon after the warm-up protocol which was performed prior to the show and soon after the show. The voice quality was acoustically and auditory-perceptually evaluated and quantified at each time point by blinded raters. Self-assessment parameters anonymously collected pre and post exercising were also analyzed.

RESULTS: No statistically significant differences on perceptual ratings and acoustic parameters were found between pre/post exercise sessions and males/females. A statistically significant improvement was detected in the self-assessment parameters concerning comfort of production, sonorousness, vocal clarity and power.

CONCLUSIONS: Vocal warm-up with the described SOVTE protocol was effective in determining a self-perceived improvement in comfort of production, voice quality and power, although objective evidence was missing. This straightforward protocol could thus be beneficial if routinely utilized by professional actors to facilitate the voice performance.

RevDate: 2020-09-16

Sugathan N, S Maruthy (2020)

Predictive factors for persistence and recovery of stuttering in children: A systematic review.

International journal of speech-language pathology [Epub ahead of print].

PURPOSE: The purpose of this study was to systematically review the available literature on various factors that can predict the persistence and recovery of stuttering in children.

METHOD: An electronic search yielded a total of 35 studies, which considered 44 variables that can be potential factors for predicting persistence and recovery.

RESULT: Among 44 factors studied, only four factors- phonological abilities, articulatory rate, change in the pattern of disfluencies, and trend in stuttering severity over one-year post-onset were identified to be replicated predictors of recovery of the stuttering. Several factors, such as differences in the second formant transition between fluent and disfluent speech, articulatory rate measured in phones/sec, etc., were observed to predict the future course of stuttering. However, these factors lack replicated evidence as predictors.

CONCLUSION: There is clear support only for limited factors as reliable predictors. Also, it is observed to be too early to conclude on several replicated factors due to differences in the age group of participants, participant sample size, and the differences in tools used in research that lead to mixed findings as a predictive factor. Hence there is a need for systematic and replicated testing of the factors identified before initiating their use for clinical purposes.

RevDate: 2020-09-28

Palaparthi A, IR Titze (2020)

Analysis of Glottal Inverse Filtering in the Presence of Source-Filter Interaction.

Speech communication, 123:98-108.

The validity of glottal inverse filtering (GIF) to obtain a glottal flow waveform from radiated pressure signal in the presence and absence of source-filter interaction was studied systematically. A driven vocal fold surface model of vocal fold vibration was used to generate source signals. A one-dimensional wave reflection algorithm was used to solve for acoustic pressures in the vocal tract. Several test signals were generated with and without source-filter interaction at various fundamental frequencies and vowels. Linear Predictive Coding (LPC), Quasi Closed Phase (QCP), and Quadratic Programming (QPR) based algorithms, along with supraglottal impulse response, were used to inverse filter the radiated pressure signals to obtain the glottal flow pulses. The accuracy of each algorithm was tested for its recovery of maximum flow declination rate (MFDR), peak glottal flow, open phase ripple factor, closed phase ripple factor, and mean squared error. The algorithms were also tested for their absolute relative errors of the Normalized Amplitude Quotient, the Quasi-Open Quotient, and the Harmonic Richness Factor. The results indicated that the mean squared error decreased with increase in source-filter interaction level suggesting that the inverse filtering algorithms perform better in the presence of source-filter interaction. All glottal inverse filtering algorithms predicted the open phase ripple factor better than the closed phase ripple factor of a glottal flow waveform, irrespective of the source-filter interaction level. Major prediction errors occurred in the estimation of the closed phase ripple factor, MFDR, peak glottal flow, normalized amplitude quotient, and Quasi-Open Quotient. Feedback-related nonlinearity (source-filter interaction) affected the recovered signal primarily when fo was well below the first formant frequency of a vowel. The prediction error increased when fo was close to the first formant frequency due to the difficulty of estimating the precise value of resonance frequencies, which was exacerbated by nonlinear kinetic losses in the vocal tract.

RevDate: 2020-09-12

Lopes LW, França FP, Evangelista DDS, et al (2020)

Does the Combination of Glottal and Supraglottic Acoustic Measures Improve Discrimination Between Women With and Without Voice Disorders?.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30300-3 [Epub ahead of print].

AIM: To analyze the accuracy of traditional acoustic measurements (F0, perturbation, and noise) and formant measurements in discriminating between women with and without voice disorders, and with different laryngeal disorders.

STUDY DESIGN: A descriptive, cross-sectional, and retrospective.

METHOD: Two hundred and sixty women participated. All participants recorded the spoken vowel /Ɛ/ and underwent laryngeal visual examination. Acoustic measures of the mean and standard deviation of the fundamental frequency (F0), jitter, shimmer, glottal-to-noise excitation ratio, and the values of the first three formants (F1, F2, and F3) were obtained.

RESULTS: Individual acoustic measurements did not demonstrate adequate (<70%) performance when discriminating between women with and without voice disorders. The combination of the standard deviation of the F0, shimmer, glottal-to-noise excitation ratio, F1, F2, and F3 showed acceptable (>70%) performance in classifying women with and without voice disorders. Individual measures of jitter as well as F1 and F3 demonstrated acceptable (>70%) performance when distinguishing women with different laryngeal diagnoses, including without voice disorders (healthy larynges), Reinke's edema, unilateral vocal fold paralysis, and sulcus vocalis. The combination of acoustic measurements showed excellent (>80%) performance when discriminating women without voice disorder from those with Reinke's edema (mean of F0, F1, and F3) and with sulcus vocalis (mean of F0, F1, and F2).

CONCLUSIONS: Individual formant and traditional acoustic measurements do not demonstrate adequate performance when discriminating between women with and without voice disorders. However, the combination of traditional and formant measurements improves the discrimination between the presence and absence of voice disorders and differentiates several laryngeal diagnoses.

RevDate: 2020-09-28

Kishimoto T, Takamiya A, Liang KC, et al (2020)

The project for objective measures using computational psychiatry technology (PROMPT): Rationale, design, and methodology.

Contemporary clinical trials communications, 19:100649.

Introduction: Depressive and neurocognitive disorders are debilitating conditions that account for the leading causes of years lived with disability worldwide. However, there are no biomarkers that are objective or easy-to-obtain in daily clinical practice, which leads to difficulties in assessing treatment response and developing new drugs. New technology allows quantification of features that clinicians perceive as reflective of disorder severity, such as facial expressions, phonic/speech information, body motion, daily activity, and sleep.

Methods: Major depressive disorder, bipolar disorder, and major and minor neurocognitive disorders as well as healthy controls are recruited for the study. A psychiatrist/psychologist conducts conversational 10-min interviews with participants ≤10 times within up to five years of follow-up. Interviews are recorded using RGB and infrared cameras, and an array microphone. As an option, participants are asked to wear wrist-band type devices during the observational period. Various software is used to process the raw video, voice, infrared, and wearable device data. A machine learning approach is used to predict the presence of symptoms, severity, and the improvement/deterioration of symptoms.

Discussion: The overall goal of this proposed study, the Project for Objective Measures Using Computational Psychiatry Technology (PROMPT), is to develop objective, noninvasive, and easy-to-use biomarkers for assessing the severity of depressive and neurocognitive disorders in the hopes of guiding decision-making in clinical settings as well as reducing the risk of clinical trial failure. Challenges may include the large variability of samples, which makes it difficult to extract the features that commonly reflect disorder severity.

Trial Registration: UMIN000021396, University Hospital Medical Information Network (UMIN).

RevDate: 2020-09-16

Skuk VG, Kirchen L, Oberhoffner T, et al (2020)

Parameter-Specific Morphing Reveals Contributions of Timbre and Fundamental Frequency Cues to the Perception of Voice Gender and Age in Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR, 63(9):3155-3175.

Purpose Using naturalistic synthesized speech, we determined the relative importance of acoustic cues in voice gender and age perception in cochlear implant (CI) users. Method We investigated 28 CI users' abilities to utilize fundamental frequency (F0) and timbre in perceiving voice gender (Experiment 1) and vocal age (Experiment 2). Parameter-specific voice morphing was used to selectively control acoustic cues (F0; time; timbre, i.e., formant frequencies, spectral-level information, and aperiodicity, as defined in TANDEM-STRAIGHT) in voice stimuli. Individual differences in CI users' performance were quantified via deviations from the mean performance of 19 normal-hearing (NH) listeners. Results CI users' gender perception seemed exclusively based on F0, whereas NH listeners efficiently used timbre. For age perception, timbre was more informative than F0 for both groups, with minor contributions of temporal cues. While a few CI users performed comparable to NH listeners overall, others were at chance. Separate analyses confirmed that even high-performing CI users classified gender almost exclusively based on F0. While high performers could discriminate age in male and female voices, low performers were close to chance overall but used F0 as a misleading cue to age (classifying female voices as young and male voices as old). Satisfaction with CI generally correlated with performance in age perception. Conclusions We confirmed that CI users' gender classification is mainly based on F0. However, high performers could make reasonable usage of timbre cues in age perception. Overall, parameter-specific morphing can serve to objectively assess individual profiles of CI users' abilities to perceive nonverbal social-communicative vocal signals.

RevDate: 2020-09-09

Hansen JHL, Bokshi M, S Khorram (2020)

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

The Journal of the Acoustical Society of America, 148(2):829.

Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).

RevDate: 2020-09-09

Winn MB, AN Moore (2020)

Perceptual weighting of acoustic cues for accommodating gender-related talker differences heard by listeners with normal hearing and with cochlear implants.

The Journal of the Acoustical Society of America, 148(2):496.

Listeners must accommodate acoustic differences between vocal tracts and speaking styles of conversation partners-a process called normalization or accommodation. This study explores what acoustic cues are used to make this perceptual adjustment by listeners with normal hearing or with cochlear implants, when the acoustic variability is related to the talker's gender. A continuum between /ʃ/ and /s/ was paired with naturally spoken vocalic contexts that were parametrically manipulated to vary by numerous cues for talker gender including fundamental frequency (F0), vocal tract length (formant spacing), and direct spectral contrast with the fricative. The goal was to examine relative contributions of these cues toward the tendency to have a lower-frequency acoustic boundary for fricatives spoken by men (found in numerous previous studies). Normal hearing listeners relied primarily on formant spacing and much less on F0. The CI listeners were individually variable, with the F0 cue emerging as the strongest cue on average.

RevDate: 2020-08-10

Chung H (2020)

Acquisition and Acoustic Patterns of Southern American English /l/ in Young Children.

Journal of speech, language, and hearing research : JSLHR, 63(8):2609-2624.

Purpose The aim of the current study was to examine /l/ developmental patterns in young learners of Southern American English, especially in relation to the effect of word position and phonetic contexts. Method Eighteen children with typically developing speech, aged between 2 and 5 years, produced monosyllabic single words containing singleton /l/ in different word positions (pre- vs. postvocalic /l/) across different vowel contexts (high front vs. low back) and cluster /l/ in different consonant contexts (/pl, bl/ vs. /kl, gl/). Each production was analyzed for its accuracy and acoustic patterns as measured by the first two formant frequencies and their difference (F1, F2, and F2-F1). Results There was great individual variability in /l/ acquisition patterns, with some 2- and 3-year-olds reaching 100% accuracy for prevocalic /l/, while others were below 70%. Overall, accuracy of prevocalic /l/ was higher than that of postvocalic /l/. Acoustic patterns of pre- and postvocalic /l/ showed greater differences in younger children and less apparent differences in 5-year-olds. There were no statistically significant differences between the acoustic patterns of /l/ coded as perceptually acceptable and those coded as misarticulated. There was also no apparent effect of vowel and consonant contexts on /l/ patterns. Conclusion The accuracy patterns of this study suggest an earlier development of /l/, especially prevocalic /l/, than has been reported in previous studies. The differences in acoustic patterns between pre- and postvocalic /l/, which become less apparent with age, may suggest that children alter the way they articulate /l/ with age. No significant acoustic differences between acceptable and misarticulated /l/, especially postvocalic /l/, suggest a gradient nature of /l/ that is dialect specific. This suggests the need for careful consideration of a child's dialect/language background when studying /l/.

RevDate: 2020-08-10

Lee J, Kim H, Y Jung (2020)

Patterns of Misidentified Vowels in Individuals With Dysarthria Secondary to Amyotrophic Lateral Sclerosis.

Journal of speech, language, and hearing research : JSLHR, 63(8):2649-2666.

Purpose The current study examines the pattern of misidentified vowels produced by individuals with dysarthria secondary to amyotrophic lateral sclerosis (ALS). Method Twenty-three individuals with ALS and 22 typical individuals produced 10 monophthongs in an /h/-vowel-/d/ context. One hundred thirty-five listeners completed a forced-choice vowel identification test. Misidentified vowels were examined in terms of the target vowel categories (front-back; low-mid-high) and the direction of misidentification (the directional pattern when the target vowel was misidentified, e.g., misidentification "to a lower vowel"). In addition, acoustic predictors of vowel misidentifications were tested based on log first formant (F1), log second formant, log F1 vowel inherent spectral change, log second formant vowel inherent spectral change, and vowel duration. Results First, high and mid vowels were more frequently misidentified than low vowels for all speaker groups. Second, front and back vowels were misidentified at a similar rate for both the Mild and Severe groups, whereas back vowels were more frequently misidentified than front vowels in typical individuals. Regarding the direction of vowel misidentification, vowel errors were mostly made within the same backness (front-back) category for all groups. In addition, more errors were found toward a lower vowel category than toward a higher vowel category in the Severe group, but not in the Mild group. Overall, log F1 difference was identified as a consistent acoustic predictor of the main vowel misidentification pattern. Conclusion Frequent misidentifications in the vowel height dimension and the acoustic predictor, F1, suggest that limited tongue height control is the major articulatory dysfunction in individuals with ALS. Clinical implications regarding this finding are discussed.

RevDate: 2020-08-09

Easwar V, Birstler J, Harrison A, et al (2020)

The Accuracy of Envelope Following Responses in Predicting Speech Audibility.

Ear and hearing [Epub ahead of print].

OBJECTIVES: The present study aimed to (1) evaluate the accuracy of envelope following responses (EFRs) in predicting speech audibility as a function of the statistical indicator used for objective response detection, stimulus phoneme, frequency, and level, and (2) quantify the minimum sensation level (SL; stimulus level above behavioral threshold) needed for detecting EFRs.

DESIGN: In 21 participants with normal hearing, EFRs were elicited by 8 band-limited phonemes in the male-spoken token /susa∫i/ (2.05 sec) presented between 20 and 65 dB SPL in 15 dB increments. Vowels in /susa∫i/ were modified to elicit two EFRs simultaneously by selectively lowering the fundamental frequency (f0) in the first formant (F1) region. The modified vowels elicited one EFR from the low-frequency F1 and another from the mid-frequency second and higher formants (F2+). Fricatives were amplitude-modulated at the average f0. EFRs were extracted from single-channel EEG recorded between the vertex (Cz) and the nape of the neck when /susa∫i/ was presented monaurally for 450 sweeps. The performance of the three statistical indicators, F-test, Hotelling's T, and phase coherence, was compared against behaviorally determined audibility (estimated SL, SL ≥0 dB = audible) using area under the receiver operating characteristics (AUROC) curve, sensitivity (the proportion of audible speech with a detectable EFR [true positive rate]), and specificity (the proportion of inaudible speech with an undetectable EFR [true negative rate]). The influence of stimulus phoneme, frequency, and level on the accuracy of EFRs in predicting speech audibility was assessed by comparing sensitivity, specificity, positive predictive value (PPV; the proportion of detected EFRs elicited by audible stimuli) and negative predictive value (NPV; the proportion of undetected EFRs elicited by inaudible stimuli). The minimum SL needed for detection was evaluated using a linear mixed-effects model with the predictor variables stimulus and EFR detection p value.

RESULTS: AUROCs of the 3 statistical indicators were similar; however, at the type I error rate of 5%, the sensitivities of Hotelling's T (68.4%) and phase coherence (68.8%) were significantly higher than the F-test (59.5%). In contrast, the specificity of the F-test (97.3%) was significantly higher than the Hotelling's T (88.4%). When analyzed using Hotelling's T as a function of stimulus, fricatives offered higher sensitivity (88.6 to 90.6%) and NPV (57.9 to 76.0%) compared with most vowel stimuli (51.9 to 71.4% and 11.6 to 51.3%, respectively). When analyzed as a function of frequency band (F1, F2+, and fricatives aggregated as low-, mid- and high-frequencies, respectively), high-frequency stimuli offered the highest sensitivity (96.9%) and NPV (88.9%). When analyzed as a function of test level, sensitivity improved with increases in stimulus level (99.4% at 65 dB SPL). The minimum SL for EFR detection ranged between 13.4 and 21.7 dB for F1 stimuli, 7.8 to 12.2 dB for F2+ stimuli, and 2.3 to 3.9 dB for fricative stimuli.

CONCLUSIONS: EFR-based inference of speech audibility requires consideration of the statistical indicator used, phoneme, stimulus frequency, and stimulus level.

RevDate: 2020-08-05

Koo SK, Kwon SB, Koh TK, et al (2020)

Acoustic analyses of snoring sounds using a smartphone in patients undergoing septoplasty and turbinoplasty.

European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery pii:10.1007/s00405-020-06268-1 [Epub ahead of print].

PURPOSE: Several studies have been performed using recently developed smartphone-based acoustic analysis techniques. We investigated the effects of septoplasty and turbinoplasty in patients with nasal septal deviation and turbinate hypertrophy accompanied by snoring by recording the sounds of snoring using a smartphone and performing acoustic analysis.

METHODS: A total of 15 male patients who underwent septoplasty with turbinoplasty for snoring and nasal obstruction were included in this prospective study. Preoperatively and 2 months after surgery, their bed partners or caregivers were instructed to record the snoring sounds. The intensity (dB), formant frequencies (F1, F2, F3, and F4), spectrogram pattern, and visual analog scale (VAS) score were analyzed for each subject.

RESULTS: Overall snoring sounds improved after surgery in 12/15 (80%) patients, and there was significant improvement in the intensity of snoring sounds after surgery (from 64.17 ± 12.18 dB to 55.62 ± 9.11 dB, p = 0.018). There was a significant difference in the F1 formant frequency before and after surgery (p = 0.031), but there were no significant differences in F2, F3, or F4. The change in F1 indicated that patients changed from mouth breathing to normal breathing. The degree of subjective snoring sounds improved significantly after surgery (VAS: from 5.40 ± 1.55 to 3.80 ± 1.26, p = 0.003).

CONCLUSION: Our results confirm that snoring is reduced when nasal congestion is improved, and they demonstrate that smartphone-based acoustic analysis of snoring sounds can be useful for diagnosis.

RevDate: 2020-09-13

Scott TL, Haenchen L, Daliri A, et al (2020)

Noninvasive neurostimulation of left ventral motor cortex enhances sensorimotor adaptation in speech production.

Brain and language, 209:104840.

Sensorimotor adaptation-enduring changes to motor commands due to sensory feedback-allows speakers to match their articulations to intended speech acoustics. How the brain integrates auditory feedback to modify speech motor commands and what limits the degree of these modifications remain unknown. Here, we investigated the role of speech motor cortex in modifying stored speech motor plans. In a within-subjects design, participants underwent separate sessions of sham and anodal transcranial direct current stimulation (tDCS) over speech motor cortex while speaking and receiving altered auditory feedback of the first formant. Anodal tDCS increased the rate of sensorimotor adaptation for feedback perturbation. Computational modeling of our results using the Directions Into Velocities of Articulators (DIVA) framework of speech production suggested that tDCS primarily affected behavior by increasing the feedforward learning rate. This study demonstrates how focal noninvasive neurostimulation can enhance the integration of auditory feedback into speech motor plans.

RevDate: 2020-07-28

Chung H, Munson B, J Edwards (2020)

Cross-Linguistic Perceptual Categorization of the Three Corner Vowels: Effects of Listener Language and Talker Age.

Language and speech [Epub ahead of print].

The present study examined the center and size of naïve adult listeners' vowel perceptual space (VPS) in relation to listener language (LL) and talker age (TA). Adult listeners of three different first languages, American English, Greek, and Korean, categorized and rated the goodness of different vowels produced by 2-year-olds and 5-year-olds and adult speakers of those languages, and speakers of Cantonese and Japanese. The center (i.e., mean first and second formant frequencies (F1 and F2)) and size (i.e., area in the F1/F2 space) of VPSs that were categorized either into /a/, /i/, or /u/ were calculated for each LL and TA group. All center and size calculations were weighted by the goodness rating of each stimulus. The F1 and F2 values of the vowel category (VC) centers differed significantly by LL and TA. These effects were qualitatively different for the three vowel categories: English listeners had different /a/ and /u/ centers than Greek and Korean listeners. The size of VPSs did not differ significantly by LL, but did differ by TA and VCs: Greek and Korean listeners had larger vowel spaces when perceiving vowels produced by 2-year-olds than by 5-year-olds or adults, and English listeners had larger vowel spaces for /a/ than /i/ or /u/. Findings indicate that vowel perceptual categories of listeners varied by the nature of their native vowel system, and were sensitive to TA.

RevDate: 2020-08-11

Mefferd AS, MS Dietrich (2020)

Tongue- and Jaw-Specific Articulatory Changes and Their Acoustic Consequences in Talkers With Dysarthria due to Amyotrophic Lateral Sclerosis: Effects of Loud, Clear, and Slow Speech.

Journal of speech, language, and hearing research : JSLHR, 63(8):2625-2636.

Purpose This study aimed to determine how tongue and jaw displacement changes impact acoustic vowel contrast in talkers with amyotrophic lateral sclerosis (ALS) and controls. Method Ten talkers with ALS and 14 controls participated in this study. Loud, clear, and slow speech cues were used to elicit tongue and jaw kinematic as well as acoustic changes. Speech kinematics was recorded using three-dimensional articulography. Independent tongue and jaw displacements were extracted during the diphthong /ai/ in kite. Acoustic distance between diphthong onset and offset in Formant 1-Formant 2 vowel space indexed acoustic vowel contrast. Results In both groups, all three speech modifications elicited increases in jaw displacement (typical < slow < loud < clear). By contrast, only slow speech elicited significantly increased independent tongue displacement in the ALS group (typical = loud = clear < slow), whereas all three speech modifications elicited significantly increased independent tongue displacement in controls (typical < loud < clear = slow). Furthermore, acoustic vowel contrast significantly increased in response to clear and slow speech in the ALS group, whereas all three speech modifications elicited significant increases in acoustic vowel contrast in controls (typical < loud < slow < clear). Finally, only jaw displacements accounted for acoustic vowel contrast gains in the ALS group. In controls, however, independent tongue displacements accounted for increases in vowel acoustic contrast during loud and slow speech, whereas jaw and independent tongue displacements accounted equally for acoustic vowel contrast change during clear speech. Conclusion Kinematic findings suggest that slow speech may be better suited to target independent tongue displacements in talkers with ALS than clear and loud speech. However, given that gains in acoustic vowel contrast were comparable for slow and clear speech cues in these talkers, future research is needed to determine potential differential impacts of slow and clear speech on perceptual measures, such as intelligibility. Finally, findings suggest that acoustic vowel contrast gains are predominantly jaw driven in talkers with ALS. Therefore, the acoustic and perceptual consequences of direct instructions of enhanced jaw movements should be compared to cued speech modification, such as clear and slow speech in these talkers.

RevDate: 2020-07-22

Laturnus R (2020)

Comparative Acoustic Analyses of L2 English: The Search for Systematic Variation.

Phonetica pii:000508387 [Epub ahead of print].

BACKGROUND/AIMS: Previous research has shown that exposure to multiple foreign accents facilitates adaptation to an untrained novel accent. One explanation is that L2 speech varies systematically such that there are commonalities in the productions of nonnative speakers, regardless of their language background.

METHODS: A systematic acoustic comparison was conducted between 3 native English speakers and 6 nonnative accents. Voice onset time, unstressed vowel duration, and formant values of stressed and unstressed vowels were analyzed, comparing each nonnative accent to the native English talkers. A subsequent perception experiment tests what effect training on regionally accented voices has on the participant's comprehension of nonnative accented speech to investigate the importance of within-speaker variation on attunement and generalization.

RESULTS: Data for each measure show substantial variability across speakers, reflecting phonetic transfer from individual L1s, as well as substantial inconsistency and variability in pronunciation, rather than commonalities in their productions. Training on native English varieties did not improve participants' accuracy in understanding nonnative speech.

CONCLUSION: These findings are more consistent with a hypothesis of accent attune-ment wherein listeners track general patterns of nonnative speech rather than relying on overlapping acoustic signals between speakers.

RevDate: 2020-09-04

Rishiq D, Harkrider A, Springer C, et al (2020)

Effects of Aging on the Subcortical Encoding of Stop Consonants.

American journal of audiology, 29(3):391-403.

Purpose The main purpose of this study was to evaluate aging effects on the predominantly subcortical (brainstem) encoding of the second-formant frequency transition, an essential acoustic cue for perceiving place of articulation. Method Synthetic consonant-vowel syllables varying in second-formant onset frequency (i.e., /ba/, /da/, and /ga/ stimuli) were used to elicit speech-evoked auditory brainstem responses (speech-ABRs) in 16 young adults (Mage = 21 years) and 11 older adults (Mage = 59 years). Repeated-measures mixed-model analyses of variance were performed on the latencies and amplitudes of the speech-ABR peaks. Fixed factors were phoneme (repeated measures on three levels: /b/ vs. /d/ vs. /g/) and age (two levels: young vs. older). Results Speech-ABR differences were observed between the two groups (young vs. older adults). Specifically, older listeners showed generalized amplitude reductions for onset and major peaks. Significant Phoneme × Group interactions were not observed. Conclusions Results showed aging effects in speech-ABR amplitudes that may reflect diminished subcortical encoding of consonants in older listeners. These aging effects were not phoneme dependent as observed using the statistical methods of this study.

RevDate: 2020-07-13

Al-Tamimi F, P Howell (2020)

Voice onset time and formant onset frequencies in Arabic stuttered speech.

Clinical linguistics & phonetics [Epub ahead of print].

Neuromuscular models of stuttering consider that making transitions between phones results in inappropriate temporal arrangements of articulators in people who stutter (PWS). Using this framework, the current study examined the acoustic productions of two fine-grained phonetic features: voice onset time (VOT) and second formant (F2). The hypotheses were that PWS should differ from fluent persons (FP) in VOT duration and F2 onset frequency as a result of the transition deficit for environments with complex phonetic features such as Arabic emphatics. Ten adolescent PWS and 10 adolescent FPs participated in the study. They read and memorized four monosyllabic plain-emphatic words silently. Data were analyzed by Repeated Measures ANOVAs. The positive and negative VOT durations of/t/vs./tˁ/and/d/vs./dˁ/and F2 onset frequency were measured acoustically. Results showed that stuttering was significantly affected by emphatic consonants. PWS had atypical VOT durations and F2 values. Findings are consistent with the atypicality of VOT and F2 reported for English-speaking PWS. This atypicality is realized differently in Arabic depending on the articulatory complexity and cognitive load of the sound.

RevDate: 2020-09-02

Levy-Lambert D, Grigos MI, LeBlanc É, et al (2020)

Communication Efficiency in a Face Transplant Recipient: Determinants and Therapeutic Implications.

The Journal of craniofacial surgery, 31(6):e528-e530.

We longitudinally assessed speech intelligibility (percent words correct/pwc), communication efficiency (intelligible words per minute/iwpm), temporal control markers (speech and pause coefficients of variation), and formant frequencies associated with lip motion in a 41-year-old face transplant recipient. Pwc and iwpm at 13 months post-transplantation were both higher than preoperative values. Multivariate regression demonstrated that temporal markers and all formant frequencies associated with lip motion were significant predictors (P < 0.05) of communication efficiency, highlighting the interplay of these variables in generating intelligible and effective speech. These findings can guide us in developing personalized rehabilitative approaches in face transplant recipients for optimal speech outcomes.

RevDate: 2020-08-22

Kim KS, Wang H, L Max (2020)

It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback.

Journal of speech, language, and hearing research : JSLHR, 63(8):2522-2534.

Purpose Various aspects of speech production related to auditory-motor integration and learning have been examined through auditory feedback perturbation paradigms in which participants' acoustic speech output is experimentally altered and played back via earphones/headphones "in real time." Scientific rigor requires high precision in determining and reporting the involved hardware and software latencies. Many reports in the literature, however, are not consistent with the minimum achievable latency for a given experimental setup. Here, we focus specifically on this methodological issue associated with implementing real-time auditory feedback perturbations, and we offer concrete suggestions for increased reproducibility in this particular line of work. Method Hardware and software latencies as well as total feedback loop latency were measured for formant perturbation studies with the Audapter software. Measurements were conducted for various audio interfaces, desktop and laptop computers, and audio drivers. An approach for lowering Audapter's software latency through nondefault parameter specification was also tested. Results Oft-overlooked hardware-specific latencies were not negligible for some of the tested audio interfaces (adding up to 15 ms). Total feedback loop latencies (including both hardware and software latency) were also generally larger than claimed in the literature. Nondefault parameter values can improve Audapter's own processing latency without negative impact on formant tracking. Conclusions Audio interface selection and software parameter optimization substantially affect total feedback loop latency. Thus, the actual total latency (hardware plus software) needs to be correctly measured and described in all published reports. Future speech research with "real-time" auditory feedback perturbations should increase scientific rigor by minimizing this latency.

RevDate: 2020-07-06

Vurma A (2020)

Amplitude Effects of Vocal Tract Resonance Adjustments When Singing Louder.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30194-6 [Epub ahead of print].

In the literature on vocal pedagogy we may find suggestions to increase the mouth opening when singing louder. It is known that sopranos tend to sing loud high notes with a wider mouth opening which raises the frequency of the first resonance of the vocal tract (fR1) to tune it close to the fundamental. Our experiment with classically trained male singers revealed that they also tended to raise the fR1 with the dynamics at pitches where the formant tuning does not seem relevant. The analysis by synthesis showed that such behaviour may contribute to the strengthening of the singer's formant by several dB-s and to a rise in the centre of spectral gravity. The contribution of the fR1 raising to the overall sound level was less consistent. Changing the extent of the mouth opening with the dynamics may create several simultaneous semantic cues that signal how prominent the produced sound is and how great the physical effort by the singer is. The diminishing of the mouth opening when singing piano may also have an importance as it helps singers to produce a quieter sound by increasing the distance between the fR1 and higher resonances, which lowers the transfer function of the vocal tract at the relevant spectral regions.

RevDate: 2020-07-03

Chaturvedi R, Kraus M, RSE Keefe (2020)

A new measure of authentic auditory emotion recognition: Application to patients with schizophrenia.

Schizophrenia research pii:S0920-9964(19)30550-X [Epub ahead of print].

BACKGROUND: Many social processes such as emotion recognition are severely impaired in patients with schizophrenia. While basic auditory processing seems to play a key role in identifying emotions, research in this field is limited due to the lack of proper assessment batteries. Many of the widely accepted tests utilize actors to portray certain emotions-these batteries are less ecologically and face valid.

METHODS: This study utilized a newly developed auditory emotion recognition test that contained natural stimuli from spontaneous displays of emotions to assess 28 patients with schizophrenia and 16 healthy controls.

RESULTS: The results indicate that the newly developed test, referred to as the INTONATION Test, is more sensitive to the emotion recognition deficits in patients with schizophrenia than previously used measures. The correlations of the INTONATION Test measures with basic auditory processes were similar to established tests of auditory emotion. Particular emotion sub scores from the INTONTATION test, such as happiness, demonstrated the strongest correlations with specific auditory processing skills, such as formant discrimination and sinusoidal amplitude modulation detection (SAM60).

CONCLUSIONS: The results from this study indicate that auditory emotion recognition impairments are more pronounced in patients with schizophrenia when perceiving authentic displays of emotion. Understanding these deficits could help specify the nature of auditory emotion recognition deficits in patients with schizophrenia and those at risk.

RevDate: 2020-07-02

Toutios A, Xu M, Byrd D, et al (2020)

How an aglossic speaker produces an alveolar-like percept without a functional tongue tip.

The Journal of the Acoustical Society of America, 147(6):EL460.

It has been previously observed [McMicken, Salles, Berg, Vento-Wilson, Rogers, Toutios, and Narayanan. (2017). J. Commun. Disorders, Deaf Stud. Hear. Aids 5(2), 1-6] using real-time magnetic resonance imaging that a speaker with severe congenital tongue hypoplasia (aglossia) had developed a compensatory articulatory strategy where she, in the absence of a functional tongue tip, produced a plosive consonant perceptually similar to /d/ using a bilabial constriction. The present paper provides an updated account of this strategy. It is suggested that the previously observed compensatory bilabial closing that occurs during this speaker's /d/ production is consistent with vocal tract shaping resulting from hyoid raising created with mylohyoid action, which may also be involved in typical /d/ production. Simulating this strategy in a dynamic articulatory synthesis experiment leads to the generation of /d/-like formant transitions.

RevDate: 2020-07-05

Harper S, Goldstein L, S Narayanan (2020)

Variability in individual constriction contributions to third formant values in American English /ɹ/.

The Journal of the Acoustical Society of America, 147(6):3905.

Although substantial variability is observed in the articulatory implementation of the constriction gestures involved in /ɹ/ production, studies of articulatory-acoustic relations in /ɹ/ have largely ignored the potential for subtle variation in the implementation of these gestures to affect salient acoustic dimensions. This study examines how variation in the articulation of American English /ɹ/ influences the relative sensitivity of the third formant to variation in palatal, pharyngeal, and labial constriction degree. Simultaneously recorded articulatory and acoustic data from six speakers in the USC-TIMIT corpus was analyzed to determine how variation in the implementation of each constriction across tokens of /ɹ/ relates to variation in third formant values. Results show that third formant values are differentially affected by constriction degree for the different constrictions used to produce /ɹ/. Additionally, interspeaker variation is observed in the relative effect of different constriction gestures on third formant values, most notably in a division between speakers exhibiting relatively equal effects of palatal and pharyngeal constriction degree on F3 and speakers exhibiting a stronger palatal effect. This division among speakers mirrors interspeaker differences in mean constriction length and location, suggesting that individual differences in /ɹ/ production lead to variation in articulatory-acoustic relations.

RevDate: 2020-09-28

Xu M, Tachibana RO, Okanoya K, et al (2020)

Unconscious and Distinctive Control of Vocal Pitch and Timbre During Altered Auditory Feedback.

Frontiers in psychology, 11:1224.

Vocal control plays a critical role in smooth social communication. Speakers constantly monitor auditory feedback (AF) and make adjustments when their voices deviate from their intentions. Previous studies have shown that when certain acoustic features of the AF are artificially altered, speakers compensate for this alteration in the opposite direction. However, little is known about how the vocal control system implements compensations for alterations of different acoustic features, and associates them with subjective consciousness. The present study investigated whether compensations for the fundamental frequency (F0), which corresponds to perceived pitch, and formants, which contribute to perceived timbre, can be performed unconsciously and independently. Forty native Japanese speakers received two types of altered AF during vowel production that involved shifts of either only the formant frequencies (formant modification; Fm) or both the pitch and formant frequencies (pitch + formant modification; PFm). For each type, three levels of shift (slight, medium, and severe) in both directions (increase or decrease) were used. After the experiment, participants were tested for whether they had perceived a change in the F0 and/or formants. The results showed that (i) only formants were compensated for in the Fm condition, while both the F0 and formants were compensated for in the PFm condition; (ii) the F0 compensation exhibited greater precision than the formant compensation in PFm; and (iii) compensation occurred even when participants misperceived or could not explicitly perceive the alteration in AF. These findings indicate that non-experts can compensate for both formant and F0 modifications in the AF during vocal production, even when the modifications are not explicitly or correctly perceived, which provides further evidence for a dissociation between conscious perception and action in vocal control. We propose that such unconscious control of voice production may enhance rapid adaptation to changing speech environments and facilitate mutual communication.

RevDate: 2020-07-17

White-Schwoch T, Magohe AK, Fellows AM, et al (2020)

Auditory neurophysiology reveals central nervous system dysfunction in HIV-infected individuals.

Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology, 131(8):1827-1832.

OBJECTIVE: To test the hypothesis that human immunodeficiency virus (HIV) affects auditory-neurophysiological functions.

METHODS: A convenience sample of 68 HIV+ and 59 HIV- normal-hearing adults was selected from a study set in Dar es Salaam, Tanzania. The speech-evoked frequency-following response (FFR), an objective measure of auditory function, was collected. Outcome measures were FFRs to the fundamental frequency (F0) and to harmonics corresponding to the first formant (F1), two behaviorally relevant cues for understanding speech.

RESULTS: The HIV+ group had weaker responses to the F1 than the HIV- group; this effect generalized across multiple stimuli (d = 0.59). Responses to the F0 were similar between groups.

CONCLUSIONS: Auditory-neurophysiological responses differ between HIV+ and HIV- adults despite normal hearing thresholds.

SIGNIFICANCE: The FFR may reflect HIV-associated central nervous system dysfunction that manifests as disrupted auditory processing of speech harmonics corresponding to the first formant.

RevDate: 2020-07-22

DiNino M, Arenberg JG, Duchen ALR, et al (2020)

Effects of Age and Cochlear Implantation on Spectrally Cued Speech Categorization.

Journal of speech, language, and hearing research : JSLHR, 63(7):2425-2440.

Purpose Weighting of acoustic cues for perceiving place-of-articulation speech contrasts was measured to determine the separate and interactive effects of age and use of cochlear implants (CIs). It has been found that adults with normal hearing (NH) show reliance on fine-grained spectral information (e.g., formants), whereas adults with CIs show reliance on broad spectral shape (e.g., spectral tilt). In question was whether children with NH and CIs would demonstrate the same patterns as adults, or show differences based on ongoing maturation of hearing and phonetic skills. Method Children and adults with NH and with CIs categorized a /b/-/d/ speech contrast based on two orthogonal spectral cues. Among CI users, phonetic cue weights were compared to vowel identification scores and Spectral-Temporally Modulated Ripple Test thresholds. Results NH children and adults both relied relatively more on the fine-grained formant cue and less on the broad spectral tilt cue compared to participants with CIs. However, early-implanted children with CIs better utilized the formant cue compared to adult CI users. Formant cue weights correlated with CI participants' vowel recognition and in children, also related to Spectral-Temporally Modulated Ripple Test thresholds. Adults and child CI users with very poor phonetic perception showed additive use of the two cues, whereas those with better and/or more mature cue usage showed a prioritized trading relationship, akin to NH listeners. Conclusions Age group and hearing modality can influence phonetic cue-weighting patterns. Results suggest that simple nonlexical categorization tests correlate with more general speech recognition skills of children and adults with CIs.

RevDate: 2020-06-16

Chiu YF, Neel A, T Loux (2020)

Acoustic characteristics in relation to intelligibility reduction in noise for speakers with Parkinson's disease.

Clinical linguistics & phonetics [Epub ahead of print].

Decreased speech intelligibility in noisy environments is frequently observed in speakers with Parkinson's disease (PD). This study investigated which acoustic characteristics across the speech subsystems contributed to poor intelligibility in noise for speakers with PD. Speech samples were obtained from 13 speakers with PD and five healthy controls reading 56 sentences. Intelligibility analysis was conducted in quiet and noisy listening conditions. Seventy-two young listeners transcribed the recorded sentences in quiet and another 72 listeners transcribed in noise. The acoustic characteristics of the speakers with PD who experienced large intelligibility reduction from quiet to noise were compared to those with smaller intelligibility reduction in noise and healthy controls. The acoustic measures in the study included second formant transitions, cepstral and spectral measures of voice (cepstral peak prominence and low/high spectral ratio), pitch variation, and articulation rate to represent speech components across speech subsystems of articulation, phonation, and prosody. The results show that speakers with PD who had larger intelligibility reduction in noise exhibited decreased second formant transition, limited cepstral and spectral variations, and faster articulation rate. These findings suggest that the adverse effect of noise on speech intelligibility in PD is related to speech changes in the articulatory and phonatory systems.

RevDate: 2020-06-15

Rankinen W, K de Jong (2020)

The Entanglement of Dialectal Variation and Speaker Normalization.

Language and speech [Epub ahead of print].

This paper explores the relationship between speaker normalization and dialectal identity in sociolinguistic data, examining a database of vowel formants collected from 88 monolingual American English speakers in Michigan's Upper Peninsula. Audio recordings of Finnish- and Italian-heritage American English speakers reading a passage and a word list were normalized using two normalization procedures. These algorithms are based on different concepts of normalization: Lobanov, which models normalization as based on experience with individual talkers, and Labov ANAE, which models normalization as based on experience with scale-factors inherent in acoustic resonators of all kinds. The two procedures yielded different results; while the Labov ANAE method reveals a cluster shifting of low and back vowels that correlated with heritage, the Lobanov procedure seems to eliminate this sociolinguistic variation. The difference between the two procedures lies in how they treat relations between formant changes, suggesting that dimensions of variation in the vowel space may be treated differently by different normalization procedures, raising the question of how anatomical variation and dialectal variation interact in the real world. The structure of the sociolinguistic effects found with the Labov ANAE normalized data, but not in the Lobanov normalized data, suggest that the Lobanov normalization does over-normalize formant measures and remove sociolinguistically relevant information.

RevDate: 2020-06-22

Ménard L, Prémont A, Trudeau-Fisette P, et al (2020)

Phonetic Implementation of Prosodic Emphasis in Preschool-Aged Children and Adults: Probing the Development of Sensorimotor Speech Goals.

Journal of speech, language, and hearing research : JSLHR, 63(6):1658-1674.

Objective We aimed to investigate the production of contrastive emphasis in French-speaking 4-year-olds and adults. Based on previous work, we predicted that, due to their immature motor control abilities, preschool-aged children would produce smaller articulatory differences between emphasized and neutral syllables than adults. Method Ten 4-year-old children and 10 adult French speakers were recorded while repeating /bib/, /bub/, and /bab/ sequences in neutral and contrastive emphasis conditions. Synchronous recordings of tongue movements, lip and jaw positions, and speech signals were made. Lip positions and tongue shapes were analyzed; formant frequencies, amplitude, fundamental frequency, and duration were extracted from the acoustic signals; and between-vowel contrasts were calculated. Results Emphasized vowels were higher in pitch, intensity, and duration than their neutral counterparts in all participants. However, the effect of contrastive emphasis on lip position was smaller in children. Prosody did not affect tongue position in children, whereas it did in adults. As a result, children's productions were perceived less accurately than those of adults. Conclusion These findings suggest that 4-year-old children have not yet learned to produce hypoarticulated forms of phonemic goals to allow them to successfully contrast syllables and enhance prosodic saliency.

RevDate: 2020-05-25

Groll MD, McKenna VS, Hablani S, et al (2020)

Formant-Estimated Vocal Tract Length and Extrinsic Laryngeal Muscle Activation During Modulation of Vocal Effort in Healthy Speakers.

Journal of speech, language, and hearing research : JSLHR, 63(5):1395-1403.

Purpose The goal of this study was to explore the relationships among vocal effort, extrinsic laryngeal muscle activity, and vocal tract length (VTL) within healthy speakers. We hypothesized that increased vocal effort would result in increased suprahyoid muscle activation and decreased VTL, as previously observed in individuals with vocal hyperfunction. Method Twenty-eight healthy speakers of American English produced vowel-consonant-vowel utterances under varying levels of vocal effort. VTL was estimated from the vowel formants. Three surface electromyography sensors measured the activation of the suprahyoid and infrahyoid muscle groups. A general linear model was used to investigate the effects of vocal effort level and surface electromyography on VTL. Two additional general linear models were used to investigate the effects of vocal effort on suprahyoid and infrahyoid muscle activities. Results Neither vocal effort nor extrinsic muscle activity showed significant effects on VTL; however, the degree of extrinsic muscle activity of both suprahyoid and infrahyoid muscle groups increased with increases in vocal effort. Conclusion Increasing vocal effort resulted in increased activation of both suprahyoid and infrahyoid musculature in healthy adults, with no change to VTL.

RevDate: 2020-07-07

Zhou H, Lu J, Zhang C, et al (2020)

Abnormal Acoustic Features Following Pharyngeal Flap Surgery in Patients Aged Six Years and Older.

The Journal of craniofacial surgery, 31(5):1395-1399.

In our study, older velopharyngeal insufficiency (posterior velopharyngeal insufficiency) patients were defined as those older than 6 years of age. This study aimed to evaluate the abnormal acoustic features of older velopharyngeal insufficiency patients before and after posterior pharyngeal flap surgery. A retrospective medical record review was conducted for patients aged 6 years and older, who underwent posterior pharyngeal flap surgery between November 2011 and March 2015. The audio records of patients were evaluated before and after surgery. Spectral analysis was conducted by the Computer Speech Lab (CSL)-4150B acoustic system with the following input data: The vowel /i/, unaspirated plosive /b/, aspirated plosives /p/, aspirated fricatives /s/ and /x/, unaspirated affricates /j/ and /z/, and aspirated affricates /c/ and /q/. The patients were followed up for 3 months. Speech outcome was evaluated by comparing the postoperatively phonetic data with preoperative data. Subjective and objective analyses showed significant differences in the sonogram, formant, and speech articulation before and after the posterior pharyngeal flap surgery. However, the sampled patients could not be considered to have a high speech articulation (<85%) as the normal value was above or equal to 96%. Our results showed that pharyngeal flap surgery could correct the speech function of older patients with posterior velopharyngeal insufficiency to some extent. Owing to the original errors in pronunciation patterns, pathological speech articulation still existed, and speech treatment is required in the future.

RevDate: 2020-05-03

Almurashi W, Al-Tamimi J, G Khattab (2020)

Static and dynamic cues in vowel production in Hijazi Arabic.

The Journal of the Acoustical Society of America, 147(4):2917.

Static cues such as formant measurements obtained at the vowel midpoint are usually taken as the main correlate for vowel identification. However, dynamic cues such as vowel-inherent spectral change have been shown to yield better classification of vowels using discriminant analysis. The aim of this study is to evaluate the role of static versus dynamic cues in Hijazi Arabic (HA) vowel classification, in addition to vowel duration and F3, which are not usually looked at. Data from 12 male HA speakers producing eight HA vowels in /hVd/ syllables were obtained, and classification accuracy was evaluated using discriminant analysis. Dynamic cues, particularly the three-point model, had higher classification rates (average 95.5%) than the remaining models (static model: 93.5%; other dynamic models: between 65.75% and 94.25%). Vowel duration had a significant role in classification accuracy (average +8%). These results are in line with dynamic approaches to vowel classification and highlight the relative importance of cues such as vowel duration across languages, particularly where it is prominent in the phonology.

RevDate: 2020-05-03

Egurtzegi A, C Carignan (2020)

An acoustic description of Mixean Basque.

The Journal of the Acoustical Society of America, 147(4):2791.

This paper presents an acoustic analysis of Mixean Low Navarrese, an endangered variety of Basque. The manuscript includes an overview of previous acoustic studies performed on different Basque varieties in order to synthesize the sparse acoustic descriptions of the language that are available. This synthesis serves as a basis for the acoustic analysis performed in the current study, in which the various acoustic analyses given in previous studies are replicated in a single, cohesive general acoustic description of Mixean Basque. The analyses include formant and duration measurements for the six-vowel system, voice onset time measurements for the three-way stop system, spectral center of gravity for the sibilants, and number of lingual contacts in the alveolar rhotic tap and trill. Important findings include: a centralized realization ([ʉ]) of the high-front rounded vowel usually described as /y/; a data-driven confirmation of the three-way laryngeal opposition in the stop system; evidence in support of an alveolo-palatal to apical sibilant merger; and the discovery of a possible incipient merger of rhotics. These results show how using experimental acoustic methods to study under-represented linguistic varieties can result in revelations of sound patterns otherwise undescribed in more commonly studied varieties of the same language.

RevDate: 2020-05-03

Mellesmoen G, M Babel (2020)

Acoustically distinct and perceptually ambiguous: ʔayʔaǰuθəm (Salish) fricatives.

The Journal of the Acoustical Society of America, 147(4):2959.

ʔayʔaǰuθəm (Comox-Sliammon) is a Central Salish language spoken in British Columbia with a large fricative inventory. Previous impressionistic descriptions of ʔayʔaǰuθəm have noted perceptual ambiguity of select anterior fricatives. This paper provides an auditory-acoustic description of the four anterior fricatives /θ s ʃ ɬ/ in the Mainland dialect of ʔayʔaǰuθəm. Peak ERBN trajectories, noise duration, and formant transitions are analysed in the fricative productions of five speakers. These analyses provide quantitative and qualitative descriptions of these fricative contrasts, indicating more robust acoustic differentiation for fricatives in onset versus coda position. In a perception task, English listeners categorized fricatives in CV and VC sequences from the natural productions. The results of the perception experiment are consistent with reported perceptual ambiguity between /s/ and /θ/, with listeners frequently misidentifying /θ/ as /s/. The production and perception data suggest that listener L1 categories play a role in the categorization and discrimination of ʔayʔaǰuθəm fricatives. These findings provide an empirical description of fricatives in an understudied language and have implications for L2 teaching and learning in language revitalization contexts.

RevDate: 2020-05-03

Rosen N, Stewart J, ON Sammons (2020)

How "mixed" is mixed language phonology? An acoustic analysis of the Michif vowel system.

The Journal of the Acoustical Society of America, 147(4):2989.

Michif, a severely endangered language still spoken today by an estimated 100-200 Métis people in Western Canada, is generally classified as a mixed language, meaning it cannot be traced back to a single language family [Bakker (1997). A Language of Our Own (Oxford University Press, Oxford); Thomason (2001). Language Contact: An Introduction (Edinburgh University Press and Georgetown University Press, Edinburgh and Washington, DC); Meakins (2013). Contact Languages: A Comprehensive Guide (Mouton De Gruyter, Berlin), pp. 159-228.]. It has been claimed to maintain the phonological grammar of both of its source languages, French and Plains Cree [Rhodes (1977). Actes du Huitieme congrès des Algonqunistes (Carleton University, Ottawa), pp. 6-25; Bakker (1997). A Language of Our Own (Oxford University Press, Oxford); Bakker and Papen (1997). Contact Languages: A Wider Perspective (John Benjamins, Amsterdam), pp. 295-363]. The goal of this paper is twofold: to offer an instrumental analysis of Michif vowels and to investigate this claim of a stratified grammar, based on this careful phonetic analysis. Using source language as a variable in the analysis, the authors argue the Michif vowel system does not appear to rely on historical information, and that historically similar French and Cree vowels pattern together within the Michif system with regards to formant frequencies and duration. The authors show that there are nine Michif oral vowels in this system, which has merged phonetically similar French- and Cree-source vowels.

RevDate: 2020-05-03

van Brenk F, H Terband (2020)

Compensatory and adaptive responses to real-time formant shifts in adults and children.

The Journal of the Acoustical Society of America, 147(4):2261.

Auditory feedback plays an important role in speech motor learning, yet, little is known about the strength of motor learning and feedback control in speech development. This study investigated compensatory and adaptive responses to auditory feedback perturbation in children (aged 4-9 years old) and young adults (aged 18-29 years old). Auditory feedback was perturbed by near-real-time shifting F1 and F2 of the vowel /ɪː/ during the production of consonant-vowel-consonant words. Children were able to compensate and adapt in a similar or larger degree compared to young adults. Higher token-to-token variability was found in children compared to adults but not disproportionately higher during the perturbation phases compared to the unperturbed baseline. The added challenge to auditory-motor integration did not influence production variability in children, and compensation and adaptation effects were found to be strong and sustainable. Significant group differences were absent in the proportions of speakers displaying a compensatory or adaptive response, an amplifying response, or no consistent response. Within these categories, children produced significantly stronger compensatory, adaptive, or amplifying responses, which could be explained by less-ingrained existing representations. The results are interpreted as both auditory-motor integration and learning capacities are stronger in young children compared to adults.

RevDate: 2020-05-03

Chiu C, JT Sun (2020)

On pharyngealized vowels in Northern Horpa: An acoustic and ultrasound study.

The Journal of the Acoustical Society of America, 147(4):2928.

In the Northern Horpa (NH) language of Sichuan, vowels are divided between plain and pharyngealized sets, with the latter pronounced with auxiliary articulatory gestures involving more constriction in the vocal tract. The current study examines how the NH vocalic contrast is manifested in line with the process of pharyngealization both acoustically and articulatorily, based on freshly gathered data from two varieties of the language (i.e., Rtsangkhog and Yunasche). Along with formant analyses, ultrasound imaging was employed to capture the tongue postures and positions during vowel production. The results show that in contrast with plain vowels, pharyngealized vowels generally feature lower F2 values and higher F1 and F3 values. Mixed results for F2 and F3 suggest that the quality contrasts are vowel-dependent. Ultrasound images, on the other hand, reveal that the vocalic distinction is affected by different types of tongue movements, including retraction, backing, and double bunching, depending on the inherent tongue positions for each vowel. The two NH varieties investigated are found to display differential formant changes and different types of tongue displacements. The formant profiles along with ultrasound images support the view that the production of the NH phonologically marked vowels is characteristic of pharyngealization.

RevDate: 2020-05-03

Horo L, Sarmah P, GDS Anderson (2020)

Acoustic phonetic study of the Sora vowel system.

The Journal of the Acoustical Society of America, 147(4):3000.

This paper is an acoustic phonetic study of vowels in Sora, a Munda language of the Austroasiatic language family. Descriptions here illustrate that the Sora vowel system has six vowels and provide evidence that Sora disyllables have prominence on the second syllable. While the acoustic categorization of vowels is based on formant frequencies, the presence of prominence on the second syllable is shown through temporal features of vowels, including duration, intensity, and fundamental frequency. Additionally, this paper demonstrates that acoustic categorization of vowels in Sora is better in the prominent syllable than in the non-prominent syllable, providing evidence that syllable prominence and vowel quality are correlated in Sora. These acoustic properties of Sora vowels are discussed in relation to the existing debates on vowels and patterns of syllable prominence in Munda languages of India. In this regard, it is noteworthy that Munda languages, in general, lack instrumental studies, and therefore this paper presents significant findings that are undocumented in other Munda languages. These acoustic studies are supported by exploratory statistical modeling and statistical classification methods.

RevDate: 2020-05-03

Sarvasy H, Elvin J, Li W, et al (2020)

An acoustic phonetic description of Nungon vowels.

The Journal of the Acoustical Society of America, 147(4):2891.

This study is a comprehensive acoustic description and analysis of the six vowels /i e a u o ɔ/ in the Towet dialect of the Papuan language Nungon ⟨yuw⟩ of northeastern Papua New Guinea. Vowel tokens were extracted from a corpus of audio speech recordings created for general language documentation and grammatical description. To assess the phonetic correlates of a claimed phonological vowel length distinction, vowel duration was measured. Multi-point acoustic analyses enabled investigation of mean vowel F1, F2, and F3; vowel trajectories, and coarticulation effects. The three Nungon back vowels were of particular interest, as they contribute to an asymmetrical, back vowel-heavy array, and /o/ had previously been described as having an especially low F2. The authors found that duration of phonologically long and short vowels differed significantly. Mean vowel formant measurements confirmed that the six phonological vowels form six distinct acoustic groupings; trajectories show slightly more formant movement in some vowels than was previously known. Adjacent nasal consonants exerted significant effects on vowel formant measurements. The authors show that an uncontrolled, general documentation corpus for an under-described language can be mined for acoustic analysis, but coarticulation effects should be taken into account.

RevDate: 2020-05-03

Nance C, S Kirkham (2020)

The acoustics of three-way lateral and nasal palatalisation contrasts in Scottish Gaelic.

The Journal of the Acoustical Society of America, 147(4):2858.

This paper presents an acoustic description of laterals and nasals in an endangered minority language, Scottish Gaelic (known as "Gaelic"). Gaelic sonorants are reported to take part in a typologically unusual three-way palatalisation contrast. Here, the acoustic evidence for this contrast is considered, comparing lateral and nasal consonants in both word-initial and word-final position. Previous acoustic work has considered lateral consonants, but nasals are much less well-described. An acoustic analysis of twelve Gaelic-dominant speakers resident in a traditionally Gaelic-speaking community is reported. Sonorant quality is quantified via measurements of F2-F1 and F3-F2 and observation of the whole spectrum. Additionally, we quantify extensive devoicing in word-final laterals that has not been previously reported. Mixed-effects regression modelling suggests robust three-way acoustic differences in lateral consonants in all relevant vowel contexts. Nasal consonants, however, display lesser evidence of the three-way contrast in formant values and across the spectrum. Potential reasons for lesser evidence of contrast in the nasal system are discussed, including the nature of nasal acoustics, evidence from historical changes, and comparison to other Goidelic dialects. In doing so, contributions are made to accounts of the acoustics of the Celtic languages, and to typologies of contrastive palatalisation in the world's languages.

RevDate: 2020-05-03

Tabain M, Butcher A, Breen G, et al (2020)

A formant study of the alveolar versus retroflex contrast in three Central Australian languages: Stop, nasal, and lateral manners of articulation.

The Journal of the Acoustical Society of America, 147(4):2745.

This study presents formant transition data from 21 speakers for the apical alveolar∼retroflex contrast in three neighbouring Central Australian languages: Arrernte, Pitjantjatjara, and Warlpiri. The contrast is examined for three manners of articulation: stop, nasal, and lateral /t ∼ ʈ/ /n ∼ ɳ/, and /l ∼ ɭ/, and three vowel contexts /a i u/. As expected, results show that a lower F3 and F4 in the preceding vowel signal a retroflex consonant; and that the alveolar∼retroflex contrast is most clearly realized in the context of an /a/ vowel, and least clearly realized in the context of an /i/ vowel. Results also show that the contrast is most clearly realized for the stop manner of articulation. These results provide an acoustic basis for the greater typological rarity of retroflex nasals and laterals as compared to stops. It is suggested that possible nasalization of the preceding vowel accounts for the poorer nasal consonant results, and that articulatory constraints on lateral consonant production account for the poorer lateral consonant results. Importantly, differences are noticed between speakers, and it is suggested that literacy plays a major role in maintenance of this marginal phonemic contrast.

RevDate: 2020-06-09

Liepins R, Kaider A, Honeder C, et al (2020)

Formant frequency discrimination with a fine structure sound coding strategy for cochlear implants.

Hearing research, 392:107970.

Recent sound coding strategies for cochlear implants (CI) have focused on the transmission of temporal fine structure to the CI recipient. To date, knowledge about the effects of fine structure coding in electrical hearing is poorly charactarized. The aim of this study was to examine whether the presence of temporal fine structure coding affects how the CI recipient perceives sound. This was done by comparing two sound coding strategies with different temporal fine structure coverage in a longitudinal cross-over setting. The more recent FS4 coding strategy provides fine structure coding on typically four apical stimulation channels compared to FSP with usually one or two fine structure channels. 34 adult CI patients with a minimum CI experience of one year were included. All subjects were fitted according to clinical routine and used both coding strategies for three months in a randomized sequence. Formant frequency discrimination thresholds (FFDT) were measured to assess the ability to resolve timbre information. Further outcome measures included a monosyllables test in quiet and the speech reception threshold of an adaptive matrix sentence test in noise (Oldenburger sentence test). In addition, the subjective sound quality was assessed using visual analogue scales and a sound quality questionnaire after each three months period. The extended fine structure range of FS4 yields FFDT similar to FSP for formants occurring in the frequency range only covered by FS4. There is a significant interaction (p = 0.048) between the extent of fine structure coverage in FSP and the improvement in FFDT in favour of FS4 for these stimuli. FS4 Speech perception in noise and quiet was similar with both coding strategies. Sound quality was rated heterogeneously showing that both strategies represent valuable options for CI fitting to allow for best possible individual optimization.

RevDate: 2020-09-03

Toyoda A, Maruhashi T, Malaivijitnond S, et al (2020)

Dominance status and copulatory vocalizations among male stump-tailed macaques in Thailand.

Primates; journal of primatology, 61(5):685-694.

Male copulation calls sometimes play important roles in sexual strategies, attracting conspecific females or advertising their social status to conspecific males. These calls generally occur in sexually competitive societies such as harem groups and multi-male and multi-female societies. However, the call functions remain unclear because of limited availability of data sets that include a large number of male and female animals in naturalistic environments, particularly in primates. Here, we examined the possible function of male-specific copulation calls in wild stump-tailed macaques (Macaca arctoides) by analyzing the contexts and acoustic features of vocalizations. We observed 395 wild stump-tailed macaques inhabiting the Khao Krapuk Khao Taomor Non-Hunting Area in Thailand and recorded all occurrences of observed copulations. We counted 446 male-specific calls in 383 copulations recorded, and measured their acoustic characteristics. Data were categorized into three groups depending on their social status: dominant (alpha and coalition) males and non-dominant males. When comparing male status, alpha males most frequently produced copulation calls at ejaculation, coalition males produced less frequent calls than alpha males, and other non-dominant males rarely vocalized, maintaining silence even when mounting females. Acoustic analysis indicated no significant influence of status (alpha or coalition) on call number, bout duration, or further formant dispersion parameters. Our results suggest that male copulation calls of this species are social status-dependent signals. Furthermore, dominant males might actively transmit their social status and copulations to other male rivals to impede their challenging attacks, while other non-dominant males maintain silence to prevent the interference of dominants.

RevDate: 2020-04-28

Saldías M, Laukkanen AM, Guzmán M, et al (2020)

The Vocal Tract in Loud Twang-Like Singing While Producing High and Low Pitches.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30057-6 [Epub ahead of print].

Twang-like vocal qualities have been related to a megaphone-like shape of the vocal tract (epilaryngeal tube and pharyngeal narrowing, and a wider mouth opening), low-frequency spectral changes, and tighter and/or increased vocal fold adduction. Previous studies have focused mainly on loud and high-pitched singing, comfortable low-pitched spoken vowels, or are based on modeling and simulation. There is no data available related to twang-like voices in loud, low-pitched singing.

PURPOSE: This study investigates the possible contribution of the lower and upper vocal tract configurations during loud twang-like singing on high and low pitches in a real subject.

METHODS: One male contemporary commercial music singer produced a sustained vowel [a:] in his habitual speaking pitch (B2) and loudness. The same vowel was also produced in a loud twang-like singing voice on high (G4) and low pitches (B2). Computerized tomography, acoustic analysis, inverse filtering, and audio-perceptual assessments were performed.

RESULTS: Both loud twang-like voices showed a megaphone-like shape of the vocal tract, being more notable on the low pitch. Also, low-frequency spectral changes, a peak of sound energy around 3 kHz and increased vocal fold adduction were found. Results agreed with audio-perceptual evaluation.

CONCLUSIONS: Loud twang-like phonation seems to be mainly related to low-frequency spectral changes (under 2 kHz) and a more compact formant structure. Twang-like qualities seem to require different degrees of twang-related vocal tract adjustments while phonating in different pitches. A wider mouth opening, pharyngeal constriction, and epilaryngeal tube narrowing may be helpful strategies for maximum power transfer and improved vocal economy in loud contemporary commercial music singing and potentially in loud speech. Further studies should focus on vocal efficiency and vocal economy measurements using modeling and simulation, based on real-singers' data.

RevDate: 2020-05-22

Yaralı M (2020)

Varying effect of noise on sound onset and acoustic change evoked auditory cortical N1 responses evoked by a vowel-vowel stimulus.

International journal of psychophysiology : official journal of the International Organization of Psychophysiology, 152:36-43.

INTRODUCTION: According to previous studies noise causes prolonged latencies and decreased amplitudes in acoustic change evoked cortical responses. Particularly for a consonant-vowel stimulus, speech shaped noise leads to more pronounced changes on onset evoked response than acoustic change evoked response. Reasoning that this may be related to the spectral characteristics of the stimuli and the noise, in the current study a vowel-vowel stimulus (/ui/) was presented in white noise during cortical response recordings. The hypothesis is that the effect of noise will be higher on acoustic change N1 compared to onset N1 due to the masking effects on formant transitions.

METHODS: Onset and acoustic change evoked auditory cortical N1-P2 responses were obtained from 21 young adults with normal hearing while presenting 1000 ms /ui/ stimuli in quiet and in white noise at +10 dB and 0 dB signal-to-noise ratio (SNR).

RESULTS: In the quiet and +10 dB SNR conditions, the N1-P2 responses to both onset and change were present. In the +10 dB SNR condition acoustic change N1-P2 peak-to-peak amplitudes were reduced and N1 latencies were prolonged compared to the quiet condition. Whereas there was not a significant change in onset N1 latencies and N1-P2 peak-to-peak amplitudes in the +10 dB SNR condition. In the 0 dB SNR condition change responses were not observed but onset N1-P2 peak-to-peak amplitudes were significantly lower, and onset N1 latencies were significantly higher compared to the quiet and the 10 dB SNR conditions. Onset and change responses were also compared with each other in each condition. N1 latencies and N1-P2 peak to peak amplitudes of onset and acoustic change were not significantly different in the quiet condition. Whereas at 10 dB SNR, acoustic change N1 latencies were higher and N1-P2 amplitudes were lower than onset latencies and amplitudes. To summarize, presentation of white noise at 10 dB SNR resulted in the reduction of acoustic change evoked N1-P2 peak-to-peak amplitudes and the prolongation of N1 latencies compared to quiet. Same effect on onsets were only observed at 0 dB SNR, where acoustic change N1 was not observed. In the quiet condition, latencies and amplitudes of onsets and changes were not different. Whereas at 10 dB SNR, acoustic change N1 latencies were higher, amplitudes were lower than onset N1.

DISCUSSION/CONCLUSIONS: The effect of noise was found to be higher on acoustic change evoked N1 response compared to onset N1. This may be related to the spectral characteristics of the utilized noise and the stimuli, possible differences in acoustic features of sound onsets and acoustic changes, or to the possible differences in the mechanisms for detecting acoustic changes and sound onsets. In order to investigate the possible reasons for more pronounced effect of noise on acoustic changes, future work with different vowel-vowel transitions in different noise types is suggested.

RevDate: 2020-04-04

Tykalova T, Skrabal D, Boril T, et al (2020)

Effect of Ageing on Acoustic Characteristics of Voice Pitch and Formants in Czech Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30086-2 [Epub ahead of print].

BACKGROUND: The relevance of formant-based measures has been noted across a spectrum of medical, technical, and linguistic applications. Therefore, the primary aim of the study was to evaluate the effect of ageing on vowel articulation, as the previous research revealed contradictory findings. The secondary aim was to provide normative acoustic data for all Czech monophthongs.

METHODS: The database consisted of 100 healthy speakers (50 men and 50 women) aged between 20 and 90. Acoustic characteristics, including vowel duration, vowel space area (VSA), fundamental frequency (fo), and the first to fourth formant frequencies (F1-F4) of 10 Czech vowels were extracted from a reading passage. In addition, the articulation rate was calculated from the entire duration of the reading passage.

RESULTS: Age-related changes in pitch were sex-dependent, while age-related alterations in F2/a/, F2/u/, VSA, and vowel duration seemed to be sex-independent. In particular, we observed a clear lowering of fo with age for women, but no change for men. With regard to formants, we found lowering of F2/a/ and F2/u/ with increased age, but no statistically significant changes in F1, F3, or F4 frequencies with advanced age. Although the alterations in F1 and F2 frequencies were rather small, they appeared to be in a direction against vowel centralization, resulting in a significantly greater VSA in the older population. The greater VSA was found to be related partly to longer vowel duration.

CONCLUSIONS: Alterations in vowel formant frequencies across several decades of adult life appear to be small or in a direction against vowel centralization, thus indicating the good preservation of articulatory precision in older speakers.

RevDate: 2020-10-06

Milenkovic PH, Wagner M, Kent RD, et al (2020)

Effects of sampling rate and type of anti-aliasing filter on linear-predictive estimates of formant frequencies in men, women, and children.

The Journal of the Acoustical Society of America, 147(3):EL221.

The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.

RevDate: 2020-06-22
CmpDate: 2020-06-22

Deloche F (2020)

Fine-grained statistical structure of speech.

PloS one, 15(3):e0230233.

In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.

RevDate: 2020-08-05

Hardy TLD, Boliek CA, Aalto D, et al (2020)

Contributions of Voice and Nonverbal Communication to Perceived Masculinity-Femininity for Cisgender and Transgender Communicators.

Journal of speech, language, and hearing research : JSLHR, 63(4):931-947.

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity-femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women (n = 10 of each) and transgender women (n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers (n = 20) rated each communicator's masculinity-femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity-femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity-femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.

RevDate: 2020-08-05

Carl M, Kent RD, Levy ES, et al (2020)

Vowel Acoustics and Speech Intelligibility in Young Adults With Down Syndrome.

Journal of speech, language, and hearing research : JSLHR, 63(3):674-687.

Purpose Speech production deficits and reduced intelligibility are frequently noted in individuals with Down syndrome (DS) and are attributed to a combination of several factors. This study reports acoustic data on vowel production in young adults with DS and relates these findings to perceptual analysis of speech intelligibility. Method Participants were eight young adults with DS as well as eight age- and gender-matched typically developing (TD) controls. Several different acoustic measures of vowel centralization and variability were applied to tokens of corner vowels (/ɑ/, /æ/, /i/, /u/) produced in common English words. Intelligibility was assessed for single-word productions of speakers with DS, by means of transcriptions from 14 adult listeners. Results Group differentiation was found for some, but not all, of the acoustic measures. Low vowels were more acoustically centralized and variable in speakers with DS than TD controls. Acoustic findings were associated with overall intelligibility scores. Vowel formant dispersion was the most sensitive measure in distinguishing DS and TD formant data. Conclusion Corner vowels are differentially affected in speakers with DS. The acoustic characterization of vowel production and its association with speech intelligibility scores within the DS group support the conclusion of motor control deficits in the overall speech impairment. Implications are discussed for effective treatment planning.

RevDate: 2020-07-02

Zhang T, Shao Y, Wu Y, et al (2020)

Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder.

IEEE journal of biomedical and health informatics, 24(7):1940-1951.

Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.

RevDate: 2020-08-21

Allison KM, Salehi S, JR Green (2020)

Effect of prosodic manipulation on articulatory kinematics and second formant trajectories in children.

The Journal of the Acoustical Society of America, 147(2):769.

This study investigated effects of rate reduction and emphatic stress cues on second formant (F2) trajectories and articulatory movements during diphthong production in 11 typically developing school-aged children. F2 extent increased in slow and emphatic stress conditions, and tongue and jaw displacement increased in the emphatic stress condition compared to habitual speech. Tongue displacement significantly predicted F2 extent across speaking conditions. Results suggest that slow rate and emphatic stress cues induce articulatory and acoustic changes in children that may enhance clarity of the acoustic signal. Potential clinical implications for improving speech in children with dysarthria are discussed.

RevDate: 2020-08-26
CmpDate: 2020-08-26

Summers RJ, B Roberts (2020)

Informational masking of speech by acoustically similar intelligible and unintelligible interferers.

The Journal of the Acoustical Society of America, 147(2):1113.

Masking experienced when target speech is accompanied by a single interfering voice is often primarily informational masking (IM). IM is generally greater when the interferer is intelligible than when it is not (e.g., speech from an unfamiliar language), but the relative contributions of acoustic-phonetic and linguistic interference are often difficult to assess owing to acoustic differences between interferers (e.g., different talkers). Three-formant analogues (F1+F2+F3) of natural sentences were used as targets and interferers. Targets were presented monaurally either alone or accompanied contralaterally by interferers from another sentence (F0 = 4 semitones higher); a target-to-masker ratio (TMR) between ears of 0, 6, or 12 dB was used. Interferers were either intelligible or rendered unintelligible by delaying F2 and advancing F3 by 150 ms relative to F1, a manipulation designed to minimize spectro-temporal differences between corresponding interferers. Target-sentence intelligibility (keywords correct) was 67% when presented alone, but fell considerably when an unintelligible interferer was present (49%) and significantly further when the interferer was intelligible (41%). Changes in TMR produced neither a significant main effect nor an interaction with interferer type. Interference with acoustic-phonetic processing of the target can explain much of the impact on intelligibility, but linguistic factors-particularly interferer intrusions-also make an important contribution to IM.

RevDate: 2020-08-26
CmpDate: 2020-08-26

Winn MB (2020)

Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script.

The Journal of the Acoustical Society of America, 147(2):852.

Voice onset time (VOT) is an acoustic property of stop consonants that is commonly manipulated in studies of phonetic perception. This paper contains a thorough description of the "progressive cutback and replacement" method of VOT manipulation, and comparison with other VOT manipulation techniques. Other acoustic properties that covary with VOT-such as fundamental frequency and formant transitions-are also discussed, along with considerations for testing VOT perception and its relationship to various other measures of auditory temporal or spectral processing. An implementation of the progressive cutback and replacement method in the Praat scripting language is presented, which is suitable for modifying natural speech for perceptual experiments involving VOT and/or related covarying F0 and intensity cues. Justifications are provided for the stimulus design choices and constraints implemented in the script.

RevDate: 2020-08-06

Riggs WJ, Hiss MM, Skidmore J, et al (2020)

Utilizing Electrocochleography as a Microphone for Fully Implantable Cochlear Implants.

Scientific reports, 10(1):3714.

Current cochlear implants (CIs) are semi-implantable devices with an externally worn sound processor that hosts the microphone and sound processor. A fully implantable device, however, would ultimately be desirable as it would be of great benefit to recipients. While some prototypes have been designed and used in a few select cases, one main stumbling block is the sound input. Specifically, subdermal implantable microphone technology has been poised with physiologic issues such as sound distortion and signal attenuation under the skin. Here we propose an alternative method that utilizes a physiologic response composed of an electrical field generated by the sensory cells of the inner ear to serve as a sound source microphone for fully implantable hearing technology such as CIs. Electrophysiological results obtained from 14 participants (adult and pediatric) document the feasibility of capturing speech properties within the electrocochleography (ECochG) response. Degradation of formant properties of the stimuli /da/ and /ba/ are evaluated across various degrees of hearing loss. Preliminary results suggest proof-of-concept of using the ECochG response as a microphone is feasible to capture vital properties of speech. However, further signal processing refinement is needed in addition to utilization of an intracochlear recording location to likely improve signal fidelity.

RevDate: 2020-09-28

Kim HT (2020)

Vocal Feminization for Transgender Women: Current Strategies and Patient Perspectives.

International journal of general medicine, 13:43-52.

Voice feminization for transgender women is a highly complicated comprehensive transition process. Voice feminization has been thought to be equal to pitch elevation. Thus, many surgical procedures have only focused on pitch raising for voice feminization. However, voice feminization should not only consider voice pitch but also consider gender differences in physical, neurophysiological, and acoustical characteristics of voice. That is why voice therapy has been the preferred choice for the feminization of the voice. Considering gender difference of phonatory system, the method for voice feminization consists of changing the following four critical elements: fundamental frequency, resonance frequency related to vocal tract volume and length, formant tuning, and phonatory pattern. Voice feminizing process can be generally divided into non-surgical feminization and surgical feminization. As a non-surgical procedure, feminization voice therapy consists of increasing fundamental frequency, improving oral and pharyngeal resonance, and behavioral therapy. Surgical feminization usually can be achieved by external approach or endoscopic approach. Based on three factors (length, tension and mass) of vocal fold for pitch modulation, surgical procedure can be classified as one-factor, two-factors and three-factors modification of vocal folds. Recent systematic reviews and meta-analysis studies have reported positive outcomes for both the voice therapy and voice feminization surgery. The benefits of voice therapy, as it is highly satisfactory, mostly increase vocal pitch, and are noninvasive. However, the surgical voice feminization of three-factors modification of vocal folds is also highly competent and provides a maximum absolute increase in vocal pitch. Voice feminization is a long transition journey for physical, neurophysiological, and psychosomatic changes that convert a male phonatory system to a female phonatory system. Therefore, strategies for voice feminization should be individualized according to the individual's physical condition, the desired change in voice pitch, economic conditions, and social roles.

RevDate: 2020-05-04

Levy ES, Moya-Galé G, Chang YM, et al (2020)

Effects of speech cues in French-speaking children with dysarthria.

International journal of language & communication disorders, 55(3):401-416.

BACKGROUND: Articulatory excursion and vocal intensity are reduced in many children with dysarthria due to cerebral palsy (CP), contributing to the children's intelligibility deficits and negatively affecting their social participation. However, the effects of speech-treatment strategies for improving intelligibility in this population are understudied, especially for children who speak languages other than English. In a cueing study on English-speaking children with dysarthria, acoustic variables and intelligibility improved when the children were provided with cues aimed to increase articulatory excursion and vocal intensity. While French is among the top 20 most spoken languages in the world, dysarthria and its management in French-speaking children are virtually unexplored areas of research. Information gleaned from such research is critical for providing an evidence base on which to provide treatment.

AIMS: To examine acoustic and perceptual changes in the speech of French-speaking children with dysarthria, who are provided with speech cues targeting greater articulatory excursion (French translation of 'speak with your big mouth') and vocal intensity (French translation of 'speak with your strong voice'). This study investigated whether, in response to the cues, the children would make acoustic changes and listeners would perceive the children's speech as more intelligible.

METHODS & PROCEDURES: Eleven children with dysarthria due to CP (six girls, five boys; ages 4;11-17;0 years; eight with spastic CP, three with dyskinetic CP) repeated pre-recorded speech stimuli across three speaking conditions (habitual, 'big mouth' and 'strong voice'). Stimuli were sentences and contrastive words in phrases. Acoustic analyses were conducted. A total of 66 Belgian-French listeners transcribed the children's utterances orthographically and rated their ease of understanding on a visual analogue scale at sentence and word levels.

OUTCOMES & RESULTS: Acoustic analyses revealed significantly longer duration in response to the big mouth cue at sentence level and in response to both the big mouth and strong voice cues at word level. Significantly higher vocal sound-pressure levels were found following both cues at sentence and word levels. Both cues elicited significantly higher first-formant vowel frequencies and listeners' greater ease-of-understanding ratings at word level. Increases in the percentage of words transcribed correctly and in sentence ease-of-understanding ratings, however, did not reach statistical significance. Considerable variability between children was observed.

Speech cues targeting greater articulatory excursion and vocal intensity yield significant acoustic changes in French-speaking children with dysarthria. However, the changes may only aid listeners' ease of understanding at word level. The significant findings and great inter-speaker variability are generally consistent with studies on English-speaking children with dysarthria, although changes appear more constrained in these French-speaking children. What this paper adds What is already known on the subject According to the only study comparing effects of speech-cueing strategies on English-speaking children with dysarthria, intelligibility increases when the children are provided with cues aimed to increase articulatory excursion and vocal intensity. Little is known about speech characteristics in French-speaking children with dysarthria and no published research has explored effects of cueing strategies in this population. What this paper adds to existing knowledge This paper is the first study to examine the effects of speech cues on the acoustics and intelligibility of French-speaking children with CP. It provides evidence that the children can make use of cues to modify their speech, although the changes may only aid listeners' ease of understanding at word level. What are the potential or actual clinical implications of this work? For clinicians, the findings suggest that speech cues emphasizing increasing articulatory excursion and vocal intensity show promise for improving the ease of understanding of words produced by francophone children with dysarthria, although improvements may be modest. The variability in the responses also suggests that this population may benefit from a combination of such cues to produce words that are easier to understand.

RevDate: 2020-06-05
CmpDate: 2020-06-05

Boë LJ, Sawallis TR, Fagot J, et al (2019)

Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science.

Science advances, 5(12):eaaw3916.

Recent articles on primate articulatory abilities are revolutionary regarding speech emergence, a crucial aspect of language evolution, by revealing a human-like system of proto-vowels in nonhuman primates and implicitly throughout our hominid ancestry. This article presents both a schematic history and the state of the art in primate vocalization research and its importance for speech emergence. Recent speech research advances allow more incisive comparison of phylogeny and ontogeny and also an illuminating reinterpretation of vintage primate vocalization data. This review produces three major findings. First, even among primates, laryngeal descent is not uniquely human. Second, laryngeal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns. Thus, evidence now overwhelmingly refutes the long-standing laryngeal descent theory, which pushes back "the dawn of speech" beyond ~200 ka ago to over ~20 Ma ago, a difference of two orders of magnitude.

RevDate: 2020-09-28

Kearney E, Nieto-Castañón A, Weerathunge HR, et al (2019)

A Simple 3-Parameter Model for Examining Adaptation in Speech and Voice Production.

Frontiers in psychology, 10:2995.

Sensorimotor adaptation experiments are commonly used to examine motor learning behavior and to uncover information about the underlying control mechanisms of many motor behaviors, including speech production. In the speech and voice domains, aspects of the acoustic signal are shifted/perturbed over time via auditory feedback manipulations. In response, speakers alter their production in the opposite direction of the shift so that their perceived production is closer to what they intended. This process relies on a combination of feedback and feedforward control mechanisms that are difficult to disentangle. The current study describes and tests a simple 3-parameter mathematical model that quantifies the relative contribution of feedback and feedforward control mechanisms to sensorimotor adaptation. The model is a simplified version of the DIVA model, an adaptive neural network model of speech motor control. The three fitting parameters of SimpleDIVA are associated with the three key subsystems involved in speech motor control, namely auditory feedback control, somatosensory feedback control, and feedforward control. The model is tested through computer simulations that identify optimal model fits to six existing sensorimotor adaptation datasets. We show its utility in (1) interpreting the results of adaptation experiments involving the first and second formant frequencies as well as fundamental frequency; (2) assessing the effects of masking noise in adaptation paradigms; (3) fitting more than one perturbation dimension simultaneously; (4) examining sensorimotor adaptation at different timepoints in the production signal; and (5) quantitatively predicting responses in one experiment using parameters derived from another experiment. The model simulations produce excellent fits to real data across different types of perturbations and experimental paradigms (mean correlation between data and model fits across all six studies = 0.95 ± 0.02). The model parameters provide a mechanistic explanation for the behavioral responses to the adaptation paradigm that are not readily available from the behavioral responses alone. Overall, SimpleDIVA offers new insights into speech and voice motor control and has the potential to inform future directions of speech rehabilitation research in disordered populations. Simulation software, including an easy-to-use graphical user interface, is publicly available to facilitate the use of the model in future studies.

RevDate: 2020-02-09

Viegas F, Viegas D, Serra Guimarães G, et al (2020)

Acoustic Analysis of Voice and Speech in Men with Skeletal Class III Malocclusion: A Pilot Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000505186 [Epub ahead of print].

OBJECTIVES: To assess the fundamental (f0) and first third formant (F1, F2, F3) frequencies of the 7 oral vowels of Brazilian Portuguese in men with skeletal class III malocclusion and to compare these measures with a control group of individuals with Angle's class I.

METHODS: Sixty men aged 18-40 years, 20 with Angle's class III skeletal malocclusion and 40 with Angle's class I malocclusion were selected by speech therapists and dentists. The speech signals were obtained from sustained vowels, and the values of f0 and frequencies of F1, F2 and F3 were estimated. The differences were verified through Student's t test, and the effect size calculation was performed.

RESULTS: In the class III group, more acute f0 values were observed in all vowels, higher values of F1 in the vowels [a] and [ε] and in F2 in the vowels [a], [e] and [i] and lower F1 and F3 values of the vowel [u].

CONCLUSION: More acute f0 values were found in all vowels investigated in the class III group, which showed a higher laryngeal position in the production of these sounds. The frequencies of the first 3 formants showed punctual differences, with higher values of F1 in the vowels [a] and [ε] and of F2 in [a], [e] and [i], and lower values of F1 and F3 in the vowel [u] in the experimental group. Thus, it is concluded that the fundamental frequency of the voice was the main parameter that differentiated the studied group from the control.

RevDate: 2020-06-17
CmpDate: 2020-06-17

Kelley MC, BV Tucker (2020)

A comparison of four vowel overlap measures.

The Journal of the Acoustical Society of America, 147(1):137.

Multiple measures of vowel overlap have been proposed that use F1, F2, and duration to calculate the degree of overlap between vowel categories. The present study assesses four of these measures: the spectral overlap assessment metric [SOAM; Wassink (2006). J. Acoust. Soc. Am. 119(4), 2334-2350], the a posteriori probability (APP)-based metric [Morrison (2008). J. Acoust. Soc. Am. 123(1), 37-40], the vowel overlap analysis with convex hulls method [VOACH; Haynes and Taylor, (2014). J. Acoust. Soc. Am. 136(2), 883-891], and the Pillai score as first used for vowel overlap by Hay, Warren, and Drager [(2006). J. Phonetics 34(4), 458-484]. Summaries of the measures are presented, and theoretical critiques of them are performed, concluding that the APP-based metric and Pillai score are theoretically preferable to SOAM and VOACH. The measures are empirically assessed using accuracy and precision criteria with Monte Carlo simulations. The Pillai score demonstrates the best overall performance in these tests. The potential applications of vowel overlap measures to research scenarios are discussed, including comparisons of vowel productions between different social groups, as well as acoustic investigations into vowel formant trajectories.

RevDate: 2020-06-18
CmpDate: 2020-06-18

Renwick MEL, JA Stanley (2020)

Modeling dynamic trajectories of front vowels in the American South.

The Journal of the Acoustical Society of America, 147(1):579.

Regional variation in American English speech is often described in terms of shifts, indicating which vowel sounds are converging or diverging. In the U.S. South, the Southern vowel shift (SVS) and African American vowel shift (AAVS) affect not only vowels' relative positions but also their formant dynamics. Static characterizations of shifting, with a single pair of first and second formant values taken near vowels' midpoint, fail to capture this vowel-inherent spectral change, which can indicate dialect-specific diphthongization or monophthongization. Vowel-inherent spectral change is directly modeled to investigate how trajectories of front vowels /i eɪ ɪ ɛ/ differ across social groups in the 64-speaker Digital Archive of Southern Speech. Generalized additive mixed models are used to test the effects of two social factors, sex and ethnicity, on trajectory shape. All vowels studied show significant differences between men, women, African American and European American speakers. Results show strong overlap between the trajectories of /eɪ, ɛ/ particularly among European American women, consistent with the SVS, and greater vowel-inherent raising of /ɪ/ among African American speakers, indicating how that lax vowel is affected by the AAVS. Model predictions of duration additionally indicate that across groups, trajectories become more peripheral as vowel duration increases.

RevDate: 2020-06-18
CmpDate: 2020-06-18

Chung H (2020)

Vowel acoustic characteristics of Southern American English variation in Louisiana.

The Journal of the Acoustical Society of America, 147(1):541.

This study examined acoustic characteristics of vowels produced by speakers from Louisiana, one of the states in the Southern English dialect region. First, how Louisiana vowels differ from or are similar to the reported patterns of Southern dialect were examined. Then, within-dialect differences across regions in Louisiana were examined. Thirty-four female adult monolingual speakers of American English from Louisiana, ranging in age from 18 to 23, produced English monosyllabic words containing 11 vowels /i, ɪ, e, ɛ, æ, ʌ, u, ʊ, o, ɔ, ɑ/. The first two formant frequencies at the midpoint of the vowel nucleus, direction, and amount of formant changes across three different time points (20, 50, and 80%), and vowel duration were compared to previously reported data on Southern vowels. Overall, Louisiana vowels showed patterns consistent with previously reported characteristics of Southern vowels that reflect ongoing changes in the Southern dialect (no evidence of acoustic reversal of tense-lax pairs, more specifically no peripheralization of front vowels). Some dialect-specific patterns were also observed (a relatively lesser degree of formant changes and slightly shorter vowel duration). These patterns were consistent across different regions within Louisiana.

RevDate: 2020-06-15
CmpDate: 2020-06-15

Maebayashi H, Takiguchi T, S Takada (2019)

Study on the Language Formation Process of Very-Low-Birth-Weight Infants in Infancy Using a Formant Analysis.

The Kobe journal of medical sciences, 65(2):E59-E70.

Expressive language development depends on anatomical factors, such as motor control of the tongue and oral cavity needed for vocalization, as well as cognitive aspects for comprehension and speech. The purpose of this study was to examine the differences in expressive language development between normal-birth-weight (NBW) infants and very-low-birth-weight (VLBW) infants in infancy using a formant analysis. We also examined the presence of differences between infants with a normal development and those with a high risk of autism spectrum disorder who were expected to exist among VLBW infants. The participants were 10 NBW infants and 10 VLBW infants 12-15 months of age whose speech had been recorded at intervals of approximately once every 3 months. The recorded speech signal was analyzed using a formant analysis, and changes due to age were observed. One NBW and 3 VLBW infants failed to pass the screening tests (CBCL and M-CHAT) at 24 months of age. The formant frequencies (F1 and F2) of the three groups of infants (NBW, VLBW and CBCL·M-CHAT non-passing infants) were scatter-plotted by age. For the NBW and VLBW infants, the area of the plot increased with age, but there was no significant expansion of the plot area for the CBCL·M-CHAT non-passing infants. The results showed no significant differences in expressive language development between NBW infants at 24 months old and VLBW infants at the corrected age. However, different language developmental patterns were observed in CBCL·M-CHAT non-passing infants, regardless of birth weight, suggesting the importance of screening by acoustic analyses.

RevDate: 2020-06-22

Hosbach-Cannon CJ, Lowell SY, Colton RH, et al (2020)

Assessment of Tongue Position and Laryngeal Height in Two Professional Voice Populations.

Journal of speech, language, and hearing research : JSLHR, 63(1):109-124.

Purpose To advance our current knowledge of singer physiology by using ultrasonography in combination with acoustic measures to compare physiological differences between musical theater (MT) and opera (OP) singers under controlled phonation conditions. Primary objectives addressed in this study were (a) to determine if differences in hyolaryngeal and vocal fold contact dynamics occur between two professional voice populations (MT and OP) during singing tasks and (b) to determine if differences occur between MT and OP singers in oral configuration and associated acoustic resonance during singing tasks. Method Twenty-one singers (10 MT and 11 OP) were included. All participants were currently enrolled in a music program. Experimental procedures consisted of sustained phonation on the vowels /i/ and /ɑ/ during both a low-pitch task and a high-pitch task. Measures of hyolaryngeal elevation, tongue height, and tongue advancement were assessed using ultrasonography. Vocal fold contact dynamics were measured using electroglottography. Simultaneous acoustic recordings were obtained during all ultrasonography procedures for analysis of the first two formant frequencies. Results Significant oral configuration differences, reflected by measures of tongue height and tongue advancement, were seen between groups. Measures of acoustic resonance also showed significant differences between groups during specific tasks. Both singer groups significantly raised their hyoid position when singing high-pitched vowels, but hyoid elevation was not statistically different between groups. Likewise, vocal fold contact dynamics did not significantly differentiate the two singer groups. Conclusions These findings suggest that, under controlled phonation conditions, MT singers alter their oral configuration and achieve differing resultant formants as compared with OP singers. Because singers are at a high risk of developing a voice disorder, understanding how these two groups of singers adjust their vocal tract configuration during their specific singing genre may help to identify risky vocal behavior and provide a basis for prevention of voice disorders.

RevDate: 2020-07-01

Souza P, Gallun F, R Wright (2020)

Contributions to Speech-Cue Weighting in Older Adults With Impaired Hearing.

Journal of speech, language, and hearing research : JSLHR, 63(1):334-344.

Purpose In a previous paper (Souza, Wright, Blackburn, Tatman, & Gallun, 2015), we explored the extent to which individuals with sensorineural hearing loss used different cues for speech identification when multiple cues were available. Specifically, some listeners placed the greatest weight on spectral cues (spectral shape and/or formant transition), whereas others relied on the temporal envelope. In the current study, we aimed to determine whether listeners who relied on temporal envelope did so because they were unable to discriminate the formant information at a level sufficient to use it for identification and the extent to which a brief discrimination test could predict cue weighting patterns. Method Participants were 30 older adults with bilateral sensorineural hearing loss. The first task was to label synthetic speech tokens based on the combined percept of temporal envelope rise time and formant transitions. An individual profile was derived from linear discriminant analysis of the identification responses. The second task was to discriminate differences in either temporal envelope rise time or formant transitions. The third task was to discriminate spectrotemporal modulation in a nonspeech stimulus. Results All listeners were able to discriminate temporal envelope rise time at levels sufficient for the identification task. There was wide variability in the ability to discriminate formant transitions, and that ability predicted approximately one third of the variance in the identification task. There was no relationship between performance in the identification task and either amount of hearing loss or ability to discriminate nonspeech spectrotemporal modulation. Conclusions The data suggest that listeners who rely to a greater extent on temporal cues lack the ability to discriminate fine-grained spectral information. The fact that the amount of hearing loss was not associated with the cue profile underscores the need to characterize individual abilities in a more nuanced way than can be captured by the pure-tone audiogram.

RevDate: 2020-10-05
CmpDate: 2020-10-05

Kamiloğlu RG, Fischer AH, DA Sauter (2020)

Good vibrations: A review of vocal expressions of positive emotions.

Psychonomic bulletin & review, 27(2):237-265.

Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.

RevDate: 2020-09-24
CmpDate: 2020-09-24

Dubey AK, Prasanna SRM, S Dandapat (2019)

Detection and assessment of hypernasality in repaired cleft palate speech using vocal tract and residual features.

The Journal of the Acoustical Society of America, 146(6):4211.

The presence of hypernasality in repaired cleft palate (CP) speech is a consequence of velopharyngeal insufficiency. The coupling of the nasal tract with the oral tract adds nasal formant and antiformant pairs in the hypernasal speech spectrum. This addition deviates the spectral and linear prediction (LP) residual characteristics of hypernasal speech compared to normal speech. In this work, the vocal tract constriction feature, peak to side-lobe ratio feature, and spectral moment features augmented by low-order cepstral coefficients are used to capture the spectral and residual deviations for hypernasality detection. The first feature captures the lower-frequencies prominence in speech due to the presence of nasal formants, the second feature captures the undesirable signal components in the residual signal due to the nasal antiformants, and the third feature captures the information about formants and antiformants in the spectrum along with the spectral envelope. The combination of three features gives normal versus hypernasal speech detection accuracies of 87.76%, 91.13%, and 93.70% for /a/, /i/, and /u/ vowels, respectively, and hypernasality severity detection accuracies of 80.13% and 81.25% for /i/ and /u/ vowels, respectively. The speech data are collected from 30 control normal and 30 repaired CP children between the ages of 7 and 12.

RevDate: 2019-12-31

Shiraishi M, Mishima K, H Umeda (2019)

Development of an Acoustic Simulation Method during Phonation of the Japanese Vowel /a/ by the Boundary Element Method.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30445-X [Epub ahead of print].

OBJECTIVES: The purpose of the present study was to establish the method for an acoustic simulation of a vocal tract created from CT data during phonation of the Japanese vowel /a/ and to verify the validity of the simulation.

MATERIAL AND METHODS: The subjects were 15 healthy adults (8 males, 7 females). The vocal tract model was created from CT data acquired during sustained phonation of the Japanese vowel /a/. After conversion to a mesh model for analysis, a wave acoustic analysis was performed with a boundary element method. The wall and the bottom of the vocal tract model were regarded as a rigid wall and a nonrigid wall, respectively. The acoustic medium was set to 37°C, and a point sound source was set in the place corresponding to the vocal cord as a sound source. The first and second formant frequencies (F1 and F2) were calculated. For 1 of the 15 subjects, the range from the upper end of the frontal sinus to the tracheal bifurcation was scanned, and 2 models were created: model 1 included the range from the frontal sinus to the tracheal bifurcation; and model 2 included the range from the frontal sinus to the glottis and added a virtually extended trachea by 12 cm cylindrically. F1 and F2 calculated from models 1 and 2 were compared. To evaluate the validity of the present simulation, F1 and F2 calculated from the simulation were compared with those of the actual voice and the sound generated using a solid model and a whistle-type artificial larynx. To judge the validity, the vowel formant frequency discrimination threshold reported in the past was used as a criterion. Namely, the relative discrimination thresholds (%), dividing ▵F by F, where F was the formant frequency calculated from the simulation, and ▵F was the difference between F and the formant frequency of the actual voice and the sound generated using the solid model and artificial larynx, were obtained.

RESULTS: F1 and F2 calculated from models 1 and 2 were similar. Therefore, to reduce the exposure dose, the remaining 14 subjects were scanned from the upper end of the frontal sinus to the glottis, and model 2 with the trachea extended by 12 cm virtually was used for the simulation. The averages of the relative discrimination thresholds against F1 and F2 calculated from the actual voice were 5.9% and 4.6%, respectively. The averages of the relative discrimination thresholds against F1 and F2 calculated from the sound generated by using the solid model and the artificial larynx were 4.1% and 3.7%, respectively.

CONCLUSIONS: The Japanese vowel /a/ could be simulated with high validity for the vocal tract models created from the CT data during phonation of /a/ using the boundary element method.

RevDate: 2020-04-29
CmpDate: 2020-04-29

Huang MY, Duan RY, Q Zhao (2020)

The influence of long-term cadmium exposure on the male advertisement call of Xenopus laevis.

Environmental science and pollution research international, 27(8):7996-8002.

Cadmium (Cd) is a non-essential environmental endocrine-disrupting compound found in water and a potential threat to aquatic habitats. Cd has been shown to have various short-term effects on aquatic animals; however, evidence for long-term effects of Cd on vocal communications in amphibians is lacking. To better understand the long-term effects of low-dose Cd on acoustic communication in amphibians, male Xenopus laevis individuals were treated with low Cd concentrations (0.1, 1, and 10 μg/L) via aqueous exposure for 24 months. At the end of the exposure, the acoustic spectrum characteristics of male advertisement calls and male movement behaviors in response to female calls were recorded. The gene and protein expressions of the androgen receptor (AR) were determined using Western blot and RT-PCR. The results showed that long-term Cd treatment affected the spectrogram and formant of the advertisement call. Compared with the control group, 10 μg/L Cd significantly decreased the first and second formant frequency, and the fundamental and main frequency, and increased the third formant frequency. One and 10-μg/L Cd treatments significantly reduced the proportion of individuals responding to female calls and prolonged the time of first movement of the male. Long-term Cd treatment induced a downregulation in the AR protein. Treatments of 0.1, 1, and 10 μg/L Cd significantly decreased the expression of AR mRNA in the brain. These findings indicate that long-term exposure of Cd has negative effects on advertisement calls in male X. laevis.

RevDate: 2020-04-03
CmpDate: 2020-04-03

Park EJ, Yoo SD, Kim HS, et al (2019)

Correlations between swallowing function and acoustic vowel space in stroke patients with dysarthria.

NeuroRehabilitation, 45(4):463-469.

BACKGROUND: Dysphagia and dysarthria tend to coexist in stroke patients. Dysphagia can reduce patients' quality of life, cause aspiration pneumonia and increased mortality.

OBJECTIVE: To evaluate correlations among swallowing function parameters and acoustic vowel space values in patients with stroke.

METHODS: Data from stroke patients with dysarthria and dysphagia were collected. The formant parameter representing the resonance frequency of the vocal tract as a two-dimensional coordinate point was measured for the /a/, /ae/, /i/, and /u/vowels, and the quadrilateral vowel space area (VSA) and formant centralization ratio (FCR) were measured. Swallowing function was evaluated by a videofluoroscopic swallowing study (VFSS) using the videofluoroscopic dysphagia scale (VDS) and penetration aspiration scale (PAS). Pearson's correlation and linear regression analyses were used to assess the correlation of VSA and FCR to VDS and PAS scores.

RESULTS: Thirty-one stroke patients with dysphagia and dysarthria were analyzed. VSA showed a negative correlation to VDS and PAS scores, while FCR showed a positive correlation to VDS score, but not to PAS score. VSA and FCR were significant factors for assessing dysphagia severity.

CONCLUSIONS: VSA and FCR values were correlated with swallowing function and may be helpful in predicting dysphagia severity associated with stroke.

RevDate: 2020-05-25

McCarthy KM, Skoruppa K, P Iverson (2019)

Development of neural perceptual vowel spaces during the first year of life.

Scientific reports, 9(1):19592.

This study measured infants' neural responses for spectral changes between all pairs of a set of English vowels. In contrast to previous methods that only allow for the assessment of a few phonetic contrasts, we present a new method that allows us to assess changes in spectral sensitivity across the entire vowel space and create two-dimensional perceptual maps of the infants' vowel development. Infants aged four to eleven months were played long series of concatenated vowels, and the neural response to each vowel change was assessed using the Acoustic Change Complex (ACC) from EEG recordings. The results demonstrated that the youngest infants' responses more closely reflected the acoustic differences between the vowel pairs and reflected higher weight to first-formant variation. Older infants had less acoustically driven responses that seemed a result of selective increases in sensitivity for phonetically similar vowels. The results suggest that phonetic development may involve a perceptual warping for confusable vowels rather than uniform learning, as well as an overall increasing sensitivity to higher-frequency acoustic information.

RevDate: 2019-12-18

Houle N, SV Levi (2019)

Effect of Phonation on Perception of Femininity/Masculinity in Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30302-9 [Epub ahead of print].

Many transwomen seek voice and communication therapy to support their transition from their gender assigned at birth to their gender identity. This has led to an increased need to examine the perception of gender and femininity/masculinity to develop evidence-based intervention practices. In this study, we explore the auditory perception of femininity/masculinity in normally phonated and whispered speech. Transwomen, ciswomen, and cismen were recorded producing /hVd/ words. Naïve listeners rated femininity/masculinity of a speaker's voice using a visual analog scale, rather than completing a binary gender identification task. The results revealed that listeners rated speakers more ambiguously in whispered speech than normally phonated speech. An analysis of speaker and token characteristics revealed that in the normally phonated condition listeners consistently use f0 to rate femininity/masculinity. In addition, some evidence was found for possible contributions of formant frequencies, particularly F2, and duration. Taken together, this provides additional evidence for the salience of f0 and F2 for voice and communication intervention among transwomen.

RevDate: 2020-09-28

Xu Y, S Prom-On (2019)

Economy of Effort or Maximum Rate of Information? Exploring Basic Principles of Articulatory Dynamics.

Frontiers in psychology, 10:2469.

Economy of effort, a popular notion in contemporary speech research, predicts that dynamic extremes such as the maximum speed of articulatory movement are avoided as much as possible and that approaching the dynamic extremes is necessary only when there is a need to enhance linguistic contrast, as in the case of stress or clear speech. Empirical data, however, do not always support these predictions. In the present study, we considered an alternative principle: maximum rate of information, which assumes that speech dynamics are ultimately driven by the pressure to transmit information as quickly and accurately as possible. For empirical data, we asked speakers of American English to produce repetitive syllable sequences such as wawawawawa as fast as possible by imitating recordings of the same sequences that had been artificially accelerated and to produce meaningful sentences containing the same syllables at normal and fast speaking rates. Analysis of formant trajectories shows that dynamic extremes in meaningful speech sometimes even exceeded those in the nonsense syllable sequences but that this happened more often in unstressed syllables than in stressed syllables. We then used a target approximation model based on a mass-spring system of varying orders to simulate the formant kinematics. The results show that the kind of formant kinematics found in the present study and in previous studies can only be generated by a dynamical system operating with maximal muscular force under strong time pressure and that the dynamics of this operation may hold the solution to the long-standing enigma of greater stiffness in unstressed than in stressed syllables. We conclude, therefore, that maximum rate of information can coherently explain both current and previous empirical data and could therefore be a fundamental principle of motor control in speech production.

RevDate: 2020-04-16
CmpDate: 2019-12-12

Root-Gutteridge H, Ratcliffe VF, Korzeniowska AT, et al (2019)

Dogs perceive and spontaneously normalize formant-related speaker and vowel differences in human speech sounds.

Biology letters, 15(12):20190555.

Domesticated animals have been shown to recognize basic phonemic information from human speech sounds and to recognize familiar speakers from their voices. However, whether animals can spontaneously identify words across unfamiliar speakers (speaker normalization) or spontaneously discriminate between unfamiliar speakers across words remains to be investigated. Here, we assessed these abilities in domestic dogs using the habituation-dishabituation paradigm. We found that while dogs habituated to the presentation of a series of different short words from the same unfamiliar speaker, they significantly dishabituated to the presentation of a novel word from a new speaker of the same gender. This suggests that dogs spontaneously categorized the initial speaker across different words. Conversely, dogs who habituated to the same short word produced by different speakers of the same gender significantly dishabituated to a novel word, suggesting that they had spontaneously categorized the word across different speakers. Our results indicate that the ability to spontaneously recognize both the same phonemes across different speakers, and cues to identity across speech utterances from unfamiliar speakers, is present in domestic dogs and thus not a uniquely human trait.

RevDate: 2020-09-10
CmpDate: 2020-09-10

Vorperian HK, Kent RD, Lee Y, et al (2019)

Corner vowels in males and females ages 4 to 20 years: Fundamental and F1-F4 formant frequencies.

The Journal of the Acoustical Society of America, 146(5):3255.

The purpose of this study was to determine the developmental trajectory of the four corner vowels' fundamental frequency (fo) and the first four formant frequencies (F1-F4), and to assess when speaker-sex differences emerge. Five words per vowel, two of which were produced twice, were analyzed for fo and estimates of the first four formants frequencies from 190 (97 female, 93 male) typically developing speakers ages 4-20 years old. Findings revealed developmental trajectories with decreasing values of fo and formant frequencies. Sex differences in fo emerged at age 7. The decrease of fo was larger in males than females with a marked drop during puberty. Sex differences in formant frequencies appeared at the earliest age under study and varied with vowel and formant. Generally, the higher formants (F3-F4) were sensitive to sex differences. Inter- and intra-speaker variability declined with age but had somewhat different patterns, likely reflective of maturing motor control that interacts with the changing anatomy. This study reports a source of developmental normative data on fo and the first four formants in both sexes. The different developmental patterns in the first four formants and vowel-formant interactions in sex differences likely point to anatomic factors, although speech-learning phenomena cannot be discounted.

RevDate: 2020-09-10
CmpDate: 2020-09-10

Gianakas SP, MB Winn (2019)

Lexical bias in word recognition by cochlear implant listeners.

The Journal of the Acoustical Society of America, 146(5):3373.

When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.

RevDate: 2020-09-10
CmpDate: 2020-09-10

Perrachione TK, Furbeck KT, EJ Thurston (2019)

Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices.

The Journal of the Acoustical Society of America, 146(5):3384.

The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )