About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

29 Sep 2023 at 01:48
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 29 Sep 2023 at 01:48 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2023-09-28

van Brenk F, Lowit A, K Tjaden (2023)

Effects of Speaking Rate on Variability of Second Formant Frequency Transitions in Dysarthria.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000534337 [Epub ahead of print].

INTRODUCTION: This study examined the utility of multiple second formant (F2) slope metrics to capture differences in speech production for individuals with dysarthria and healthy controls as a function of speaking rate. In addition, the utility of F2 slope metrics for predicting severity of intelligibility impairment in dysarthria was examined.

METHODS: 23 speakers with Parkinson's disease and mild to moderate hypokinetic dysarthria (HD), 9 speakers with various neurological diseases and mild to severe ataxic dysarthria (AD), and 26 age-matched healthy control speakers (CON) participated in a sentence repetition task. Sentences were produced at habitual, fast, and slow speaking rate. A variety of metrics were derived from the rising second formant (F2) transition portion of the diphthong /ai/. To obtain measures of intelligibility for the two clinical speaker groups, 15 undergraduate SLP students participated in a transcription experiment.

RESULTS: Significantly shallower slopes were found for the speakers with hypokinetic dysarthria compared to control speakers. Steeper F2 slopes were associated with increased speaking rate for all groups. Higher variability in F2 slope metrics was found for the speakers with ataxic dysarthria compared to the two other speaker groups. For both clinical speaker groups, there was a negative association between intelligibility and F2 slope variability metrics, indicating lower variability in speech production was associated with higher intelligibility.

DISCUSSION: F2 slope metrics were sensitive to dysarthria presence, dysarthria type and speaking rate. The current study provided evidence that the use of F2 slope variability measures has additional value to F2 slope averaged measures for predicting severity of intelligibility impairment in dysarthria.

RevDate: 2023-09-27

Liu W, Wang T, X Huang (2023)

The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation?.

The Journal of the Acoustical Society of America, 154(3):1903-1920.

The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.

RevDate: 2023-09-25

Toppo R, S Sinha (2023)

The Acoustics of Gender in Indian English: Toward Forensic Profiling in a Multilingual Context.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00239-4 [Epub ahead of print].

The present study is an acoustic analysis of Indian English, specifically examining the speech patterns and characteristics of three different groups with different native languages. This study investigates fundamental frequency (fo), fo range, fo variance, formant frequencies, and vowel space size in 42 native male and female speakers of Odia, Bangla, and Hindi. Furthermore, it investigated the potential correlation between fundamental frequency and vowel space, examining whether variations in vowel space size could be influenced by gender-specific perceptual factors. The paper emphasizes that in a multilingual context, gender identification can be efficiently correlated with both fo and formant frequencies. To measure a range of acoustic characteristics, speech samples were collected from the recording task. Analysis was done on PRAAT. The study revealed significant differences between genders for the examined acoustic characteristics. Results indicate differences in the size of gender-specific variations among the language groups, with females exhibiting more significant differences in fo, formant frequencies, and vowel space than males. The findings show no significant correlation between fo and vowel space area, indicating that other features are responsible for large vowel space for females. These findings display significant potential toward creating a robust empirical framework for gender profiling that can be utilized in a wide range of forensic linguistics investigations.

RevDate: 2023-09-22

Osiecka AN, Briefer EF, Kidawa D, et al (2023)

Social calls of the little auk (Alle alle) reflect body size and possibly partnership, but not sex.

Royal Society open science, 10(9):230845.

Source-filter theory posits that an individual's size and vocal tract length are reflected in the parameters of their calls. In species that mate assortatively, this could result in vocal similarity. In the context of mate selection, this would mean that animals could listen in to find a partner that sounds-and therefore is-similar to them. We investigated the social calls of the little auk (Alle alle), a highly vocal seabird mating assortatively, using vocalizations produced inside 15 nests by known individuals. Source- and filter-related acoustic parameters were used in linear mixed models testing the possible impact of body size. A principal component analysis followed by a permuted discriminant function analysis tested the effect of sex. Additionally, randomization procedures tested whether partners are more vocally similar than random birds. There was a significant effect of size on the mean fundamental frequency of a simple call, but not on parameters of a multisyllable call with apparent formants. Neither sex nor partnership influenced the calls-there was, however, a tendency to match certain parameters between partners. This indicates that vocal cues are at best weak indicators of size, and other factors likely play a role in mate selection.

RevDate: 2023-09-20

Georgiou GP (2023)

Comparison of the prediction accuracy of machine learning algorithms in crosslinguistic vowel classification.

Scientific reports, 13(1):15594.

Machine learning algorithms can be used for the prediction of nonnative sound classification based on crosslinguistic acoustic similarity. To date, very few linguistic studies have compared the classification accuracy of different algorithms. This study aims to assess how well machines align with human speech perception by assessing the ability of three machine learning algorithms, namely, linear discriminant analysis (LDA), decision tree (C5.0), and neural network (NNET), to predict the classification of second language (L2) sounds in terms of first language (L1) categories. The models were trained using the first three formants and duration of L1 vowels and fed with the same acoustic features of L2 vowels. To validate their accuracy, adult L2 speakers completed a perceptual classification task. The results indicated that NNET predicted with success the classification of all L2 vowels with the highest proportion in terms of L1 categories, while LDA and C5.0 missed only one vowel each. Furthermore, NNET exhibited superior accuracy in predicting the full range of above chance responses, followed closely by LDA. C5.0 did not meet the anticipated performance levels. The findings can hold significant implications for advancing both the theoretical and practical frameworks of speech acquisition.

RevDate: 2023-09-17

Zhang T, Liu X, Liu G, et al (2023)

PVR-AFM: A Pathological Voice Repair System based on Non-linear Structure.

Journal of voice : official journal of the Voice Foundation, 37(5):648-662.

OBJECTIVE: Speech signal processing has become an important technique to ensure that the voice interaction system communicates accurately with the user by improving the clarity or intelligibility of speech signals. However, most existing works only focus on whether to process the voice of average human but ignore the communication needs of individuals suffering from voice disorder, including voice-related professionals, older people, and smokers. To solve this demand, it is essential to design a non-invasive repair system that processes pathological voices.

METHODS: In this paper, we propose a repair system for multiple polyp vowels, such as /a/, /i/ and /u/. We utilize a non-linear model based on amplitude-modulation (AM) and a frequency-modulation (FM) structure to extract the pitch and formant of pathological voice. To solve the fracture and instability of pitch, we provide a pitch extraction algorithm, which ensures that pitch's stability and avoids the errors of double pitch caused by the instability of low-frequency signal. Furthermore, we design a formant reconstruction mechanism, which can effectively determine the frequency and bandwidth to accomplish formant repair.

RESULTS: Finally, spectrum observation and objective indicators show that the system has better performance in improving the intelligibility of pathological speech.

RevDate: 2023-09-13

Roland V, Huet K, Harmegnies B, et al (2023)

Vowel production: a potential speech biomarker for early detection of dysarthria in Parkinson's disease.

Frontiers in psychology, 14:1129830.

OBJECTIVES: Our aim is to detect early, subclinical speech biomarkers of dysarthria in Parkinson's disease (PD), i.e., systematic atypicalities in speech that remain subtle, are not easily detectible by the clinician, so that the patient is labeled "non-dysarthric." Based on promising exploratory work, we examine here whether vowel articulation, as assessed by three acoustic metrics, can be used as early indicator of speech difficulties associated with Parkinson's disease.

STUDY DESIGN: This is a prospective case-control study.

METHODS: Sixty-three individuals with PD and 35 without PD (healthy controls-HC) participated in this study. Out of 63 PD patients, 43 had been diagnosed with dysarthria (DPD) and 20 had not (NDPD). Sustained vowels were recorded for each speaker and formant frequencies were measured. The analyses focus on three acoustic metrics: individual vowel triangle areas (tVSA), vowel articulation index (VAI) and the Phi index.

RESULTS: tVSA were found to be significantly smaller for DPD speakers than for HC. The VAI showed significant differences between these two groups, indicating greater centralization and lower vowel contrasts in the DPD speakers with dysarhtria. In addition, DPD and NDPD speakers had lower Phi values, indicating a lower organization of their vowel system compared to the HC. Results also showed that the VAI index was the most efficient to distinguish between DPD and NDPD whereas the Phi index was the best acoustic metric to discriminate NDPD and HC.

CONCLUSION: This acoustic study identified potential subclinical vowel-related speech biomarkers of dysarthria in speakers with Parkinson's disease who have not been diagnosed with dysarthria.

RevDate: 2023-09-11

Perrine BL, RC Scherer (2023)

Using a vertical three-mass computational model of the vocal folds to match human phonation of three adult males.

The Journal of the Acoustical Society of America, 154(3):1505-1525.

Computer models of phonation are used to study various parameters that are difficult to control, measure, and observe in human subjects. Imitating human phonation by varying the prephonatory conditions of computer models offers insight into the variations that occur across human phonatory production. In the present study, a vertical three-mass computer model of phonation [Perrine, Scherer, Fulcher, and Zhai (2020). J. Acoust. Soc. Am. 147, 1727-1737], driven by empirical pressures from a physical model of the vocal folds (model M5), with a vocal tract following the design of Ishizaka and Flanagan [(1972). Bell Sys. Tech. J. 51, 1233-1268] was used to match prolonged vowels produced by three male subjects using various pitch and loudness levels. The prephonatory conditions of tissue mass and tension, subglottal pressure, glottal diameter and angle, posterior glottal gap, false vocal fold gap, and vocal tract cross-sectional areas were varied in the model to match the model output with the fundamental frequency, alternating current airflow, direct current airflow, skewing quotient, open quotient, maximum flow negative derivative, and the first three formant frequencies from the human production. Parameters were matched between the model and human subjects with an average overall percent mismatch of 4.40% (standard deviation = 6.75%), suggesting a reasonable ability of the simple low dimensional model to mimic these variables.

RevDate: 2023-08-31

Steffman J (2023)

Vowel-internal cues to vowel quality and prominence in speech perception.

Phonetica [Epub ahead of print].

This study examines how variation in F0 and intensity impacts the perception of American English vowels. Both properties vary intrinsically as a function of vowel features in the speech production literature, raising the question of the perceptual impact of each. In addition to considering listeners' interpretation of either cue as an intrinsic property of the vowel, the possible prominence-marking function of each is considered. Two patterns of prominence strengthening in vowels, sonority expansion and hyperarticulation, are tested in light of recent findings that contextual prominence impacts vowel perception in line with these effects (i.e. a prominent vowel is expected by listeners to be realized as if it had undergone prominence strengthening). Across four vowel contrasts with different height and frontness features, listeners categorized phonetic continua with variation in formants, F0 and intensity. Results show that variation in level F0 height is interpreted as an intrinsic cue by listeners. Higher F0 cues a higher vowel, following intrinsic F0 effects in the production literature. In comparison, intensity is interpreted as a prominence-lending cue, for which effect directionality is dependent on vowel height. Higher intensity high vowels undergo perceptual re-calibration in line with (acoustic) hyperarticulation, whereas higher intensity non-high vowels undergo perceptual re-calibration in line with sonority expansion.

RevDate: 2023-08-26

Yang J, Yue Y, Lv H, et al (2023)

Effect of Adding Intermediate Layers on the Interface Bonding Performance of WC-Co Diamond-Coated Cemented Carbide Tool Materials.

Molecules (Basel, Switzerland), 28(16): pii:molecules28165958.

The interface models of diamond-coated WC-Co cemented carbide (DCCC) were constructed without intermediate layers and with different interface terminals, such as intermediate layers of TiC, TiN, CrN, and SiC. The adhesion work of the interface model was calculated based on the first principle. The results show that the adhesion work of the interface was increased after adding four intermediate layers. Their effect on improving the interface adhesion performance of cemented carbide coated with diamond was ranked in descending order as follows: SiC > CrN > TiC > TiN. The charge density difference and the density of states were further analyzed. After adding the intermediate layer, the charge distribution at the interface junction was changed, and the electron cloud at the interface junction overlapped to form a more stable chemical bond. Additionally, after adding the intermediate layer, the density of states of the atoms at the interface increased in the energy overlapping area. The formant formed between the electronic orbitals enhances the bond strength. Thus, the interface bonding performance of DCCC was enhanced. Among them, the most obvious was the interatomic electron cloud overlapping at the diamond/SiCC-Si/WC-Co interface, its bond length was the shortest (1.62 Å), the energy region forming the resonance peak was the largest (-5-20 eV), and the bonding was the strongest. The interatomic bond length at the diamond/TiNTi/WC-Co interface was the longest (4.11 Å), the energy region forming the resonance peak was the smallest (-5-16 eV), and the bonding was the weakest. Comprehensively considering four kinds of intermediate layers, the best intermediate layer for improving the interface bonding performance of DCCC was SiC, and the worst was TiN.

RevDate: 2023-08-24

Bradshaw AR, Lametti DR, Shiller DM, et al (2023)

Speech motor adaptation during synchronous and metronome-timed speech.

Journal of experimental psychology. General pii:2024-01928-001 [Epub ahead of print].

Sensorimotor integration during speech has been investigated by altering the sound of a speaker's voice in real time; in response, the speaker learns to change their production of speech sounds in order to compensate (adaptation). This line of research has however been predominantly limited to very simple speaking contexts, typically involving (a) repetitive production of single words and (b) production of speech while alone, without the usual exposure to other voices. This study investigated adaptation to a real-time perturbation of the first and second formants during production of sentences either in synchrony with a prerecorded voice (synchronous speech group) or alone (solo speech group). Experiment 1 (n = 30) found no significant difference in the average magnitude of compensatory formant changes between the groups; however, synchronous speech resulted in increased between-individual variability in such formant changes. Participants also showed acoustic-phonetic convergence to the voice they were synchronizing with prior to introduction of the feedback alteration. Furthermore, the extent to which the changes required for convergence agreed with those required for adaptation was positively correlated with the magnitude of subsequent adaptation. Experiment 2 tested an additional group with a metronome-timed speech task (n = 15) and found a similar pattern of increased between-participant variability in formant changes. These findings demonstrate that speech motor adaptation can be measured robustly at the group level during performance of more complex speaking tasks; however, further work is needed to resolve whether self-voice adaptation and other-voice convergence reflect additive or interactive effects during sensorimotor control of speech. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

RevDate: 2023-08-17

Ancel EE, Smith ML, Rao VNV, et al (2023)

Relating Acoustic Measures to Listener Ratings of Children's Productions of Word-Initial /ɹ/ and /w/.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The /ɹ/ productions of young children acquiring American English are highly variable and often inaccurate, with [w] as the most common substitution error. One acoustic indicator of the goodness of children's /ɹ/ productions is the difference between the frequency of the second formant (F2) and the third formant (F3), with a smaller F3-F2 difference being associated with a perceptually more adultlike /ɹ/. This study analyzed the effectiveness of automatically extracted F3-F2 differences in characterizing young children's productions of /ɹ/-/w/ in comparison with manually coded measurements.

METHOD: Automated F3-F2 differences were extracted from productions of a variety of different /ɹ/- and /w/-initial words spoken by 3- to 4-year-old monolingual preschoolers (N = 117; 2,278 tokens in total). These automated measures were compared to ratings of the phoneme goodness of children's productions as rated by untrained adult listeners (n = 132) on a visual analog scale, as well as to narrow transcriptions of the production into four categories: [ɹ], [w], and two intermediate categories.

RESULTS: Data visualizations show a weak relationship between automated F3-F2 differences with listener ratings and narrow transcriptions. Mixed-effects models suggest the automated F3-F2 difference only modestly predicts listener ratings (R [2] = .37) and narrow transcriptions (R [2] = .32).

CONCLUSION: The weak relationship between automated F3-F2 difference and both listener ratings and narrow transcriptions suggests that these automated acoustic measures are of questionable reliability and utility in assessing preschool children's mastery of the /ɹ/-/w/ contrast.

RevDate: 2023-08-09

Stilp C, E Chodroff (2023)

"Please say what this word is": Linguistic experience and acoustic context interact in vowel categorization .

JASA express letters, 3(8):.

Ladefoged and Broadbent [(1957). J. Acoust. Soc. Am. 29(1), 98-104] is a foundational study in speech perception research, demonstrating that acoustic properties of earlier sounds alter perception of subsequent sounds: a context sentence with a lowered first formant (F1) frequency promotes perception of a raised F1 in a target word, and vice versa. The present study replicated the original with U.K. and U.S. listeners. While the direction of the perceptual shift was consistent with the original study, neither sample replicated the large effect sizes. This invites consideration of how linguistic experience relates to the magnitudes of these context effects.

RevDate: 2023-08-09

Tanner J (2023)

Prosodic and durational influences on the formant dynamics of Japanese vowels.

JASA express letters, 3(8):.

The relationship between prosodic structure and segmental realisation is a central question within phonetics. For vowels, this has been typically examined in terms of duration, leaving largely unanswered how prosodic boundaries influence spectral realisation. This study examines the influence of prosodic boundary strength-as well as duration and pauses-on vowel dynamics in spontaneous Japanese. While boundary strength has a marginal effect on dynamics, increased duration and pauses result in greater vowel peripherality and spectral change. These findings highlight the complex relationship between prosodic and segmental structure, and illustrate the importance of multifactorial analysis in corpus research.

RevDate: 2023-08-07

Hilger A, Cole J, C Larson (2023)

Task-dependent pitch auditory feedback control in cerebellar ataxia.

Research square pii:rs.3.rs-3186155.

Purpose The purpose of this study was to investigate how ataxia affects the task-dependent role of pitch auditory feedback control in speech. In previous research, individuals with ataxia produced over-corrected, hypermetric compensatory responses to unexpected pitch and formant frequency perturbations in auditory feedback in sustained vowels and single words (Houde et al., 2019; Li et al., 2019; Parrell et al., 2017). In this study, we investigated whether ataxia would also affect the task-dependent role of the auditory feedback control system, measuring whether pitch-shift responses would be mediated by speech task or semantic focus pattern as they are in neurologically healthy speakers. Methods Twenty-two adults with ataxia and 29 age- and sex-matched control participants produced sustained vowels and sentences with and without corrective focus while their auditory feedback was briefly and unexpectedly perturbed in pitch by +/-200 cents. The magnitude and latency of the reflexive pitch-shift responses were measured as a reflection of auditory feedback control. Results Individuals with ataxia produced larger reflexive pitch-shift responses in both the sustained-vowel and sentence-production tasks than the control participants. Additionally, a differential response magnitude was observed by task and sentence focus pattern for both groups. Conclusion These findings demonstrate that even though accuracy of auditory feedback control correction is affected by cerebellar damage, as evidenced by the hypermetric responses, the system still retains efficiency in utilizing the task-dependent role of auditory feedback.

RevDate: 2023-08-04

Gao Y, Feng Y, Wu D, et al (2023)

Effect of Wearing Different Masks on Acoustic, Aerodynamic, and Formant Parameters.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00191-1 [Epub ahead of print].

OBJECTIVE: This study aimed to investigate the effects of different types of masks on acoustic, aerodynamic, and formant parameters in healthy people.

METHODS: Our study involved 30 healthy participants, 15 of each gender, aged 20-40 years. The tests were conducted under four conditions: without a mask, after wearing a surgical mask, after wearing a head-mounted N95 mask, and after wearing an ear-mounted N95 mask. Voice recording was done with the mask on. The acoustic parameters include mean fundamental frequency (F0), mean intensity, percentage of jitter (local), percentage of shimmer (local), mean noise to harmonic ratio (NHR), aerodynamic parameter, maximum phonation time (MPT), and formant parameters (/a/, /i/, /u/ three vowels F1, F2).

RESULTS: The main effect of mask type was significant in MPT, mean F0, mean HNR, /a/F1, /a/F2, /i/F2. However, the effect sizes and power in /a/F2, /i/F2 were low. MPT, mean F0 and mean HNR significantly increased and /a/F1 significantly decreased after wearing the head-mounted n95 mask. The mean F0 and mean HNR increased significantly after wearing the ear-mounted n95 mask. No significant changes were observed in parameters after wearing the surgical mask in this study. When the statistics are performed separately for males and females, the results obtained are similar to those previously obtained for unspecified males and females.

CONCLUSION: After wearing the surgical mask, this study found insignificant changes in mean F0, jitter (local), shimmer (local), mean NHR, mean intensity, MPT, and the vowels F1 and F2. This may be due to the looser design of the surgical mask and the relatively small attenuation of sound. N95 masks have a greater effect on vocalization than surgical masks and may cause changes in F0 and HNR after wearing an N95 mask. In the present study, no significant changes in jitter and shimmer were observed after wearing the mask. In addition, there was a significant reduction in /a/F1 after wearing the N95 headgear mask may owing to its high restriction of jaw mobility. In future studies, the change in jaw movement amplitude after wearing the mouthpiece can be added to investigate.

RevDate: 2023-07-31

Rizzi R, GM Bidelman (2023)

Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech.

Cerebral cortex (New York, N.Y. : 1991) pii:7233661 [Epub ahead of print].

So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- versus high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" versus "ga." The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.

RevDate: 2023-07-28

Kim KS, Gaines JL, Parrell B, et al (2023)

Mechanisms of sensorimotor adaptation in a hierarchical state feedback control model of speech.

PLoS computational biology, 19(7):e1011244 pii:PCOMPBIOL-D-22-01400 [Epub ahead of print].

Upon perceiving sensory errors during movements, the human sensorimotor system updates future movements to compensate for the errors, a phenomenon called sensorimotor adaptation. One component of this adaptation is thought to be driven by sensory prediction errors-discrepancies between predicted and actual sensory feedback. However, the mechanisms by which prediction errors drive adaptation remain unclear. Here, auditory prediction error-based mechanisms involved in speech auditory-motor adaptation were examined via the feedback aware control of tasks in speech (FACTS) model. Consistent with theoretical perspectives in both non-speech and speech motor control, the hierarchical architecture of FACTS relies on both the higher-level task (vocal tract constrictions) as well as lower-level articulatory state representations. Importantly, FACTS also computes sensory prediction errors as a part of its state feedback control mechanism, a well-established framework in the field of motor control. We explored potential adaptation mechanisms and found that adaptive behavior was present only when prediction errors updated the articulatory-to-task state transformation. In contrast, designs in which prediction errors updated forward sensory prediction models alone did not generate adaptation. Thus, FACTS demonstrated that 1) prediction errors can drive adaptation through task-level updates, and 2) adaptation is likely driven by updates to task-level control rather than (only) to forward predictive models. Additionally, simulating adaptation with FACTS generated a number of important hypotheses regarding previously reported phenomena such as identifying the source(s) of incomplete adaptation and driving factor(s) for changes in the second formant frequency during adaptation to the first formant perturbation. The proposed model design paves the way for a hierarchical state feedback control framework to be examined in the context of sensorimotor adaptation in both speech and non-speech effector systems.

RevDate: 2023-08-04
CmpDate: 2023-08-04

Illner V, Tykalova T, Skrabal D, et al (2023)

Automated Vowel Articulation Analysis in Connected Speech Among Progressive Neurological Diseases, Dysarthria Types, and Dysarthria Severities.

Journal of speech, language, and hearing research : JSLHR, 66(8):2600-2621.

PURPOSE: Although articulatory impairment represents distinct speech characteristics in most neurological diseases affecting movement, methods allowing automated assessments of articulation deficits from the connected speech are scarce. This study aimed to design a fully automated method for analyzing dysarthria-related vowel articulation impairment and estimate its sensitivity in a broad range of neurological diseases and various types and severities of dysarthria.

METHOD: Unconstrained monologue and reading passages were acquired from 459 speakers, including 306 healthy controls and 153 neurological patients. The algorithm utilized a formant tracker in combination with a phoneme recognizer and subsequent signal processing analysis.

RESULTS: Articulatory undershoot of vowels was presented in a broad spectrum of progressive neurodegenerative diseases, including Parkinson's disease, progressive supranuclear palsy, multiple-system atrophy, Huntington's disease, essential tremor, cerebellar ataxia, multiple sclerosis, and amyotrophic lateral sclerosis, as well as in related dysarthria subtypes including hypokinetic, hyperkinetic, ataxic, spastic, flaccid, and their mixed variants. Formant ratios showed a higher sensitivity to vowel deficits than vowel space area. First formants of corner vowels were significantly lower for multiple-system atrophy than cerebellar ataxia. Second formants of vowels /a/ and /i/ were lower in ataxic compared to spastic dysarthria. Discriminant analysis showed a classification score of up to 41.0% for disease type, 39.3% for dysarthria type, and 49.2% for dysarthria severity. Algorithm accuracy reached an F-score of 0.77.

CONCLUSIONS: Distinctive vowel articulation alterations reflect underlying pathophysiology in neurological diseases. Objective acoustic analysis of vowel articulation has the potential to provide a universal method to screen motor speech disorders.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.23681529.

RevDate: 2023-07-28

Mailhos A, Egea-Caparrós DA, Cabana Á, et al (2023)

Voice pitch is negatively associated with sociosexual behavior in males but not in females.

Frontiers in psychology, 14:1200065.

Acoustic cues play a major role in social interactions in many animal species. In addition to the semantic contents of human speech, voice attributes - e.g., voice pitch, formant position, formant dispersion, etc. - have been proposed to provide critical information for the assessment of potential rivals and mates. However, prior studies exploring the association of acoustic attributes with reproductive success, or some of its proxies, have produced mixed results. Here, we investigate whether the mean fundamental frequency (F0), formant position (Pf), and formant dispersion (Df) - dimorphic attributes of the human voice - are related to sociosexuality, as measured by the Revised Sociosexual Orientation Inventory (SOI-R) - a trait also known to exhibit sex differences - in a sample of native Spanish-speaking students (101 males, 147 females). Analyses showed a significant negative correlation between F0 and sociosexual behavior, and between Pf and sociosexual desire in males but not in females. These correlations remained significant after correcting for false discovery rate (FDR) and controlling for age, a potential confounding variable. Our results are consistent with a role of F0 and Pf serving as cues in the mating domain in males but not in females. Alternatively, the association of voice attributes and sociosexual orientation might stem from the parallel effect of male sex hormones both on the male brain and the anatomical structures involved in voice production.

RevDate: 2023-07-21

González-Alvarez J, R Sos-Peña (2023)

Body Perception From Connected Speech: Speaker Height Discrimination from Natural Sentences and Sine-Wave Replicas with and without Pitch.

Perceptual and motor skills, 130(4):1353-1365.

In addition to language, the human voice carries information about the physical characteristics of speakers, including their body size (height and weight). The fundamental speaking frequency, perceived as voice pitch, and the formant frequencies, or resonators of the vocal tract, are the acoustic speech parameters that have been most intensely studied for perceiving a speaker's body size. In this study, we created sine-wave (SW) replicas of connected speech (sentences) uttered by 20 male and 20 female speakers, consisting of three time-varying sinusoidal waves matching the frequency pattern of the first three formants of each sentence. These stimuli only provide information about the formant frequencies of a speech signal. We also created a new experimental condition by adding a sinusoidal replica of the voice pitch of each sentence. Results obtained from a binary discrimination task revealed that (a) our SW replicas provided sufficient useful information to accurately judge the speakers' body height at an above chance level; (b) adding the sinusoidal replica about the voice pitch did not significantly increase accuracy; and (c) stimuli from female speakers were more informative for body height detection and allowed higher perceptual accuracy, due to a stronger correlation between formant frequencies and actual body height than stimuli from male speakers.

RevDate: 2023-07-19

Vilanova ID, Almeida SB, de Araújo VS, et al (2023)

Impact of orthognathic surgery on voice and speech: a systematic review and meta-analysis.

European journal of orthodontics pii:7226525 [Epub ahead of print].

BACKGROUND: Orthognathic surgical procedures, whether in one or both jaws, can affect structures regarding the articulation and resonance of voice and speech.

OBJECTIVE: Evaluating the impact of orthognathic surgery on voice and speech performance in individuals with skeletal dentofacial disharmony.

SEARCH METHODS: Word combinations and truncations were adapted for the following electronic databases: EMBASE, PubMed/Medline, Scopus, Web of Science, Cochrane Library, and Latin American and Caribbean Literature in Health Sciences (LILACS), and grey literature.

SELECTION CRITERIA: The research included studies on nonsyndromic adults with skeletal dentofacial disharmony undergoing orthognathic surgery. These studies assessed patients before and after surgery or compared them with individuals with good facial harmony using voice and speech parameters through validated protocols.

DATA COLLECTION AND ANALYSIS: Two independent reviewers performed all stages of the review. The Joanna Briggs Institute tool was used to assess risk of bias in the cohort studies, and ROBINS-I was used for nonrandomized clinical trials. The authors also performed a meta-analysis of random effects.

RESULTS: A total of 1163 articles were retrieved after the last search, of which 23 were read in full. Of these, four were excluded, totalling 19 articles for quantitative synthesis. When comparing the pre- and postoperative periods, both for fundamental frequency, formants, and jitter and shimmer perturbation measures, orthognathic surgery did not affect vowel production. According to the articles, the main articulatory errors associated with skeletal dentofacial disharmonies prior to surgery were distortions of fricative sounds, mainly/s/ and/z/.

CONCLUSIONS: Orthognathic surgery may have little or no impact on vocal characteristics during vowel production. However, due to the confounding factors involved, estimates are inconclusive. The most prevalent articulatory disorders in the preoperative period were distortion of the fricative phonemes/s/ and/z/. However, further studies must be carried out to ensure greater robustness to these findings.


RevDate: 2023-07-18
CmpDate: 2023-07-14

Stoehr A, Souganidis C, Thomas TB, et al (2023)

Voice onset time and vowel formant measures in online testing and laboratory-based testing with(out) surgical face masks.

The Journal of the Acoustical Society of America, 154(1):152-166.

Since the COVID-19 pandemic started, conducting experiments online is increasingly common, and face masks are often used in everyday life. It remains unclear whether phonetic detail in speech production is captured adequately when speech is recorded in internet-based experiments or in experiments conducted with face masks. We tested 55 Spanish-Basque-English trilinguals in picture naming tasks in three conditions: online, laboratory-based with surgical face masks, and laboratory-based without face masks (control). We measured plosive voice onset time (VOT) in each language, the formants and duration of English vowels /iː/ and /ɪ/, and the Spanish/Basque vowel space. Across conditions, there were differences between English and Spanish/Basque VOT and in formants and duration between English /iː/-/ɪ/; between conditions, small differences emerged. Relative to the control condition, the Spanish/Basque vowel space was larger in online testing and smaller in the face mask condition. We conclude that testing online or with face masks is suitable for investigating phonetic detail in within-participant designs although the precise measurements may differ from those in traditional laboratory-based research.

RevDate: 2023-07-18
CmpDate: 2023-07-13

Kries J, De Clercq P, Lemmens R, et al (2023)

Acoustic and phonemic processing are impaired in individuals with aphasia.

Scientific reports, 13(1):11208.

Acoustic and phonemic processing are understudied in aphasia, a language disorder that can affect different levels and modalities of language processing. For successful speech comprehension, processing of the speech envelope is necessary, which relates to amplitude changes over time (e.g., the rise times). Moreover, to identify speech sounds (i.e., phonemes), efficient processing of spectro-temporal changes as reflected in formant transitions is essential. Given the underrepresentation of aphasia studies on these aspects, we tested rise time processing and phoneme identification in 29 individuals with post-stroke aphasia and 23 healthy age-matched controls. We found significantly lower performance in the aphasia group than in the control group on both tasks, even when controlling for individual differences in hearing levels and cognitive functioning. Further, by conducting an individual deviance analysis, we found a low-level acoustic or phonemic processing impairment in 76% of individuals with aphasia. Additionally, we investigated whether this impairment would propagate to higher-level language processing and found that rise time processing predicts phonological processing performance in individuals with aphasia. These findings show that it is important to develop diagnostic and treatment tools that target low-level language processing mechanisms.

RevDate: 2023-07-10

Maes P, Weyland M, M Kissine (2023)

Structure and acoustics of the speech of verbal autistic preschoolers.

Journal of child language pii:S0305000923000417 [Epub ahead of print].

In this study, we report an extensive investigation of the structural language and acoustical specificities of the spontaneous speech of ten three- to five-year-old verbal autistic children. The autistic children were compared to a group of ten typically developing children matched pairwise on chronological age, nonverbal IQ and socioeconomic status, and groupwise on verbal IQ and gender on various measures of structural language (phonetic inventory, lexical diversity and morpho-syntactic complexity) and a series of acoustical measures of speech (mean and range fundamental frequency, a formant dispersion index, syllable duration, jitter and shimmer). Results showed that, overall, the structure and acoustics of the verbal autistic children's speech were highly similar to those of the TD children. Few remaining atypicalities in the speech of autistic children lay in a restricted use of different vocabulary items, a somewhat diminished morpho-syntactic complexity, and a slightly exaggerated syllable duration.

RevDate: 2023-08-02
CmpDate: 2023-07-10

Park EJ, SD Yoo (2023)

Correlation between the parameters of quadrilateral vowel and dysphonia severity in patients with traumatic brain injury.

Medicine, 102(27):e33030.

Dysarthria and dysphonia are common in patients with traumatic brain injury (TBI). Multiple factors may contribute to TBI-induced dysarthria, including poor vocalization, articulation, respiration, and/or resonance. Many patients suffer from dysarthria that persists after the onset of TBI, with negative effects on their quality of life. This study aimed to investigate the relationship between vowel quadrilateral parameters and Dysphoria Severity Index (DSI), which objectively reflects vocal function We retrospectively enrolled TBI patients diagnosed using computer tomography. Participants had dysarthria and dysphonia and underwent acoustic analysis. Praat software was used to measure vowel space area (VSA), formant centralization ratio (FCR), and the second formant (F2) ratio. For the 4 corner vowels (/a/,/u/,/i/, and/ae/), the resonance frequency of the vocal folds was measured and is shown as 2-dimensional coordinates for the formant parameters. Pear-son correlation and multiple linear regression analyses were performed between the variables. VSA showed a significant positive correlation with DSI/a/ (R = 0.221) and DSI/i/ (R = 0.026). FCR showed a significant negative correlation with DSI/u/ and DSI/i/. The F2 ratio showed a significant positive correlation with DSI/u/ and DSI/ae/. In the multiple linear regression analysis, VSA was found to be a significant predictor of DSI/a/ (β = 0.221, P = .030, R 2 = 0.139). F2 ratio (β = 0.275, P = .0.015) and FCR (β = -0.218, P = .029) was a significant predictor of DSI/u/ (R 2 = 0.203). FCR was a significant predictor of DSI/i/ (β = -0.260, P = .010, R 2 = 0.158). F2 ratio was a significant predictor of DSI/ae/ (β = 0.254, P = .013, R 2 = 0.154). Vowel quadrilateral parameters, such as VSA, FCR, and F2 ratio, may be associated with dysphonia severity in TBI patients.

RevDate: 2023-07-18

Persson A, TF Jaeger (2023)

Evaluating normalization accounts against the dense vowel space of Central Swedish.

Frontiers in psychology, 14:1165742.

Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.

RevDate: 2023-07-18
CmpDate: 2023-07-10

Steinschneider M (2023)

Toward an understanding of vowel encoding in the human auditory cortex.

Neuron, 111(13):1995-1997.

In this issue of Neuron, Oganian et al.[1] performed intracranial recordings in the auditory cortex of human subjects to clarify how vowels are encoded by the brain. Formant-based tuning curves demonstrated the organization of vowel encoding. The need for population codes and demonstration of speaker normalization were emphasized.

RevDate: 2023-07-18

Hong Y, Chen S, Zhou F, et al (2023)

Phonetic entrainment in L2 human-robot interaction: an investigation of children with and without autism spectrum disorder.

Frontiers in psychology, 14:1128976.

Phonetic entrainment is a phenomenon in which people adjust their phonetic features to approach those of their conversation partner. Individuals with Autism Spectrum Disorder (ASD) have been reported to show some deficits in entrainment during their interactions with human interlocutors, though deficits in terms of significant differences from typically developing (TD) controls were not always registered. One reason related to the inconsistencies of whether deficits are detected or not in autistic individuals is that the conversation partner's speech could hardly be controlled, and both the participants and the partners might be adjusting their phonetic features. The variabilities in the speech of conversation partners and various social traits exhibited might make the phonetic entrainment (if any) of the participants less detectable. In this study, we attempted to reduce the variability of the interlocutors by employing a social robot and having it do a goal-directed conversation task with children with and without ASD. Fourteen autistic children and 12 TD children participated the current study in their second language English. Results showed that autistic children showed comparable vowel formants and mean fundamental frequency (f0) entrainment as their TD peers, but they did not entrain their f0 range as the TD group did. These findings suggest that autistic children were capable of exhibiting phonetic entrainment behaviors similar to TD children in vowel formants and f0, particularly in a less complex situation where the speech features and social traits of the interlocutor were controlled. Furthermore, the utilization of a social robot may have increased the interest of these children in phonetic entrainment. On the other hand, entrainment of f0 range was more challenging for these autistic children even in a more controlled situation. This study demonstrates the viability and potential of using human-robot interactions as a novel method to evaluate abilities and deficits in phonetic entrainment in autistic children.

RevDate: 2023-07-04

Terranova F, Baciadonna L, Maccarone C, et al (2023)

Penguins perceive variations of source- and filter-related vocal parameters of species-specific vocalisations.

Animal cognition [Epub ahead of print].

Animal vocalisations encode a wide range of biological information about the age, sex, body size, and social status of the emitter. Moreover, vocalisations play a significant role in signalling the identity of the emitter to conspecifics. Recent studies have shown that, in the African penguin (Spheniscus demersus), acoustic cues to individual identity are encoded in the fundamental frequency (F0) and resonance frequencies (formants) of the vocal tract. However, although penguins are known to produce vocalisations where F0 and formants vary among individuals, it remains to be tested whether the receivers can perceive and use such information in the individual recognition process. In this study, using the Habituation-Dishabituation (HD) paradigm, we tested the hypothesis that penguins perceive and respond to a shift of ± 20% (corresponding to the natural inter-individual variation observed in ex-situ colonies) of F0 and formant dispersion (ΔF) of species-specific calls. We found that penguins were more likely to look rapidly and for longer at the source of the sound when F0 and formants of the calls were manipulated, indicating that they could perceive variations of these parameters in the vocal signals. Our findings provide the first experimental evidence that, in the African penguin, listeners can perceive changes in F0 and formants, which can be used by the receiver as potential cues for the individual discrimination of the emitter.

RevDate: 2023-06-30

Panneton R, Cristia A, Taylor C, et al (2023)

Positive Valence Contributes to Hyperarticulation in Maternal Speech to Infants and Puppies.

Journal of child language pii:S0305000923000296 [Epub ahead of print].

Infant-directed speech often has hyperarticulated features, such as point vowels whose formants are further apart than in adult-directed speech. This increased "vowel space" may reflect the caretaker's effort to speak more clearly to infants, thus benefiting language processing. However, hyperarticulation may also result from more positive valence (e.g., speaking with positive vocal emotion) often found in mothers' speech to infants. This study was designed to replicate others who have found hyperarticulation in maternal speech to their 6-month-olds, but also to examine their speech to a non-human infant (i.e., a puppy). We rated both kinds of maternal speech for their emotional valence and recorded mothers' speech to a human adult. We found that mothers produced more positively valenced utterances and some hyperarticulation in both their infant- and puppy-directed speech, compared to their adult-directed speech. This finding promotes looking at maternal speech from a multi-faceted perspective that includes emotional state.

RevDate: 2023-07-18
CmpDate: 2023-07-17

Vogt C, Floegel M, Kasper J, et al (2023)

Oxytocinergic modulation of speech production-a double-blind placebo-controlled fMRI study.

Social cognitive and affective neuroscience, 18(1):.

Many socio-affective behaviors, such as speech, are modulated by oxytocin. While oxytocin modulates speech perception, it is not known whether it also affects speech production. Here, we investigated effects of oxytocin administration and interactions with the functional rs53576 oxytocin receptor (OXTR) polymorphism on produced speech and its underlying brain activity. During functional magnetic resonance imaging, 52 healthy male participants read sentences out loud with either neutral or happy intonation, a covert reading condition served as a common baseline. Participants were studied once under the influence of intranasal oxytocin and in another session under placebo. Oxytocin administration increased the second formant of produced vowels. This acoustic feature has previously been associated with speech valence; however, the acoustic differences were not perceptually distinguishable in our experimental setting. When preparing to speak, oxytocin enhanced brain activity in sensorimotor cortices and regions of both dorsal and right ventral speech processing streams, as well as subcortical and cortical limbic and executive control regions. In some of these regions, the rs53576 OXTR polymorphism modulated oxytocin administration-related brain activity. Oxytocin also gated cortical-basal ganglia circuits involved in the generation of happy prosody. Our findings suggest that several neural processes underlying speech production are modulated by oxytocin, including control of not only affective intonation but also sensorimotor aspects during emotionally neutral speech.

RevDate: 2023-06-21

Vasquez-Serrano P, Reyes-Moreno J, Guido RC, et al (2023)

MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00163-7 [Epub ahead of print].

On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers' data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.

RevDate: 2023-06-12

Cox C, Dideriksen C, Keren-Portnoy T, et al (2023)

Infant-directed speech does not always involve exaggerated vowel distinctions: Evidence from Danish.

Child development [Epub ahead of print].

This study compared the acoustic properties of 26 (100% female, 100% monolingual) Danish caregivers' spontaneous speech addressed to their 11- to 24-month-old infants (infant-directed speech, IDS) and an adult experimenter (adult-directed speech, ADS). The data were collected between 2016 and 2018 in Aarhus, Denmark. Prosodic properties of Danish IDS conformed to cross-linguistic patterns, with a higher pitch, greater pitch variability, and slower articulation rate than ADS. However, an acoustic analysis of vocalic properties revealed that Danish IDS had a reduced or similar vowel space, higher within-vowel variability, raised formants, and lower degree of vowel discriminability compared to ADS. None of the measures, except articulation rate, showed age-related differences. These results push for future research to conduct theory-driven comparisons across languages with distinct phonological systems.

RevDate: 2023-08-04

Philosophical Transactions B Editorial team (2023)

Editor's note: Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 378(1882):20230201.

RevDate: 2023-06-13

Baron A, Harwood V, Kleinman D, et al (2023)

Where on the face do we look during phonemic restoration: An eye-tracking study.

Frontiers in psychology, 14:1005186.

Face to face communication typically involves audio and visual components to the speech signal. To examine the effect of task demands on gaze patterns in response to a speaking face, adults participated in two eye-tracking experiments with an audiovisual (articulatory information from the mouth was visible) and a pixelated condition (articulatory information was not visible). Further, task demands were manipulated by having listeners respond in a passive (no response) or an active (button press response) context. The active experiment required participants to discriminate between speech stimuli and was designed to mimic environmental situations which require one to use visual information to disambiguate the speaker's message, simulating different listening conditions in real-world settings. Stimuli included a clear exemplar of the syllable /ba/ and a second exemplar in which the formant initial consonant was reduced creating an /a/-like consonant. Consistent with our hypothesis, results revealed that the greatest fixations to the mouth were present in the audiovisual active experiment and visual articulatory information led to a phonemic restoration effect for the /a/ speech token. In the pixelated condition, participants fixated on the eyes, and discrimination of the deviant token within the active experiment was significantly greater than the audiovisual condition. These results suggest that when required to disambiguate changes in speech, adults may look to the mouth for additional cues to support processing when it is available.

RevDate: 2023-06-11

Ikuma T, McWhorter AJ, Oral E, et al (2023)

Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00154-6 [Epub ahead of print].

OBJECTIVES: This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects.

METHODS: Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested.

RESULTS: Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections.

CONCLUSIONS: Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.

RevDate: 2023-06-05

Ashokumar M, Guichet C, Schwartz JL, et al (2023)

Correlation between the effect of orofacial somatosensory inputs in speech perception and speech production performance.

Auditory perception & cognition, 6(1-2):97-107.

INTRODUCTION: Orofacial somatosensory inputs modify the perception of speech sounds. Such auditory-somatosensory integration likely develops alongside speech production acquisition. We examined whether the somatosensory effect in speech perception varies depending on individual characteristics of speech production.

METHODS: The somatosensory effect in speech perception was assessed by changes in category boundary between /e/ and /ø/ in a vowel identification test resulting from somatosensory stimulation providing facial skin deformation in the rearward direction corresponding to articulatory movement for /e/ applied together with the auditory input. Speech production performance was quantified by the acoustic distances between the average first, second and third formants of /e/ and /ø/ utterances recorded in a separate test.

RESULTS: The category boundary between /e/ and /ø/ was significantly shifted towards /ø/ due to the somatosensory stimulation which is consistent with previous research. The amplitude of the category boundary shift was significantly correlated with the acoustic distance between the mean second - and marginally third - formants of /e/ and /ø/ productions, with no correlation with the first formant distance.

DISCUSSION: Greater acoustic distances can be related to larger contrasts between the articulatory targets of vowels in speech production. These results suggest that the somatosensory effect in speech perception can be linked to speech production performance.

RevDate: 2023-06-01
CmpDate: 2023-05-29

Saba JN, Ali H, JHL Hansen (2023)

The effects of estimation accuracy, estimation approach, and number of selected channels using formant-priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

The Journal of the Acoustical Society of America, 153(5):3100.

Previously, selection of l channels was prioritized according to formant frequency locations in an l-of-n-of-m-based signal processing strategy to provide important voicing information independent of listening environments for cochlear implant (CI) users. In this study, ideal, or ground truth, formants were incorporated into the selection stage to determine the effect of accuracy on (1) subjective speech intelligibility, (2) objective channel selection patterns, and (3) objective stimulation patterns (current). An average +11% improvement (p < 0.05) was observed across six CI users in quiet, but not for noise or reverberation conditions. Analogous increases in channel selection and current for the upper range of F1 and a decrease across mid-frequencies with higher corresponding current, were both observed at the expense of noise-dominant channels. Objective channel selection patterns were analyzed a second time to determine the effects of estimation approach and number of selected channels (n). A significant effect of estimation approach was only observed in the noise and reverberation condition with minor differences in channel selection and significantly decreased stimulated current. Results suggest that estimation method, accuracy, and number of channels in the proposed strategy using ideal formants may improve intelligibility when corresponding stimulated current of formant channels are not masked by noise-dominant channels.

RevDate: 2023-07-11
CmpDate: 2023-06-19

Carney LH, Cameron DA, Kinast KB, et al (2023)

Effects of sensorineural hearing loss on formant-frequency discrimination: Measurements and models.

Hearing research, 435:108788.

This study concerns the effect of hearing loss on discrimination of formant frequencies in vowels. In the response of the healthy ear to a harmonic sound, auditory-nerve (AN) rate functions fluctuate at the fundamental frequency, F0. Responses of inner-hair-cells (IHCs) tuned near spectral peaks are captured (or dominated) by a single harmonic, resulting in lower fluctuation depths than responses of IHCs tuned between spectral peaks. Therefore, the depth of neural fluctuations (NFs) varies along the tonotopic axis and encodes spectral peaks, including formant frequencies of vowels. This NF code is robust across a wide range of sound levels and in background noise. The NF profile is converted into a rate-place representation in the auditory midbrain, wherein neurons are sensitive to low-frequency fluctuations. The NF code is vulnerable to sensorineural hearing loss (SNHL) because capture depends upon saturation of IHCs, and thus the interaction of cochlear gain with IHC transduction. In this study, formant-frequency discrimination limens (DLFFs) were estimated for listeners with normal hearing or mild to moderate SNHL. The F0 was fixed at 100 Hz, and formant peaks were either aligned with harmonic frequencies or placed between harmonics. Formant peak frequencies were 600 and 2000 Hz, in the range of first and second formants of several vowels. The difficulty of the task was varied by changing formant bandwidth to modulate the contrast in the NF profile. Results were compared to predictions from model auditory-nerve and inferior colliculus (IC) neurons, with listeners' audiograms used to individualize the AN model. Correlations between DLFFs, audiometric thresholds near the formant frequencies, age, and scores on the Quick speech-in-noise test are reported. SNHL had a strong effect on DLFF for the second formant frequency (F2), but relatively small effect on DLFF for the first formant (F1). The IC model appropriately predicted substantial threshold elevations for changes in F2 as a function of SNHL and little effect of SNHL on thresholds for changes in F1.

RevDate: 2023-06-02

Rizzi R, GM Bidelman (2023)

Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech.

bioRxiv : the preprint server for biology.

So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- vs. high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" vs. "ga". The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.

RevDate: 2023-06-07
CmpDate: 2023-05-23

Cox SR, Huang T, Chen WR, et al (2023)

An acoustic study of Cantonese alaryngeal speech in different speaking conditions.

The Journal of the Acoustical Society of America, 153(5):2973.

Esophageal (ES) speech, tracheoesophageal (TE) speech, and the electrolarynx (EL) are common methods of communication following the removal of the larynx. Our recent study demonstrated that intelligibility may increase for Cantonese alaryngeal speakers using clear speech (CS) compared to their everyday "habitual speech" (HS), but the reasoning is still unclear [Hui, Cox, Huang, Chen, and Ng (2022). Folia Phoniatr. Logop. 74, 103-111]. The purpose of this study was to assess the acoustic characteristics of vowels and tones produced by Cantonese alaryngeal speakers using HS and CS. Thirty-one alaryngeal speakers (9 EL, 10 ES, and 12 TE speakers) read The North Wind and the Sun passage in HS and CS. Vowel formants, vowel space area (VSA), speaking rate, pitch, and intensity were examined, and their relationship to intelligibility were evaluated. Statistical models suggest that larger VSAs significantly improved intelligibility, but slower speaking rate did not. Vowel and tonal contrasts did not differ between HS and CS for all three groups, but the amount of information encoded in fundamental frequency and intensity differences between high and low tones positively correlated with intelligibility for TE and ES groups, respectively. Continued research is needed to understand the effects of different speaking conditions toward improving acoustic and perceptual characteristics of Cantonese alaryngeal speech.

RevDate: 2023-06-19
CmpDate: 2023-06-19

Valls-Ontañón A, Ferreiro M, Moragues-Aguiló B, et al (2023)

Impact of 3-dimensional anatomical changes secondary to orthognathic surgery on voice resonance and articulatory function: a prospective study.

The British journal of oral & maxillofacial surgery, 61(5):373-379.

An evaluation was made of the impact of orthognathic surgery (OS) on speech, addressing in particular the effects of skeletal and airway changes on voice resonance characteristics and articulatory function. A prospective study was carried out involving 29 consecutive patientssubjected to OS. Preoperative, and short and long-term postoperative evaluations were made of anatomical changes (skeletal and airway measurements), speech evolution (assessed objectively by acoustic analysis: fundamental frequency, local jitter, local shimmer of each vowel, and formants F1 and F2 of vowel /a/), and articulatory function (use of compensatory musculature, point of articulation, and speech intelligibility). These were also assessed subjectively by means of a visual analogue scale. Articulatory function after OS showed immediate improvement and had further progressed at one year of follow up. This improvement significantly correlated with the anatomical changes, and was also notably perceived by the patient. On the other hand, although a slight modification in vocal resonance was reported and seen to correlate with anatomical changes of the tongue, hyoid bone, and airway, it was not subjectively perceived by the patients. In conclusion, the results demonstrated that OS had beneficial effects on articulatory function and imperceptible subjective changes in a patient's voice. Patients subjected to OS, apart from benefitting from improved articulatory function, should not be afraid that they will not recognise their voice after treatment.

RevDate: 2023-05-21

Shellikeri S, Cho S, Ash S, et al (2023)

Digital markers of motor speech impairments in natural speech of patients with ALS-FTD spectrum disorders.

medRxiv : the preprint server for health sciences pii:2023.04.29.23289308.

BACKGROUND AND OBJECTIVES: Patients with ALS-FTD spectrum disorders (ALS-FTSD) have mixed motor and cognitive impairments and require valid and quantitative assessment tools to support diagnosis and tracking of bulbar motor disease. This study aimed to validate a novel automated digital speech tool that analyzes vowel acoustics from natural, connected speech as a marker for impaired articulation due to bulbar motor disease in ALS-FTSD.

METHODS: We used an automatic algorithm called Forced Alignment Vowel Extraction (FAVE) to detect spoken vowels and extract vowel acoustics from 1 minute audio-recorded picture descriptions. Using automated acoustic analysis scripts, we derived two articulatory-acoustic measures: vowel space area (VSA, in Bark [2]) which represents tongue range-of-motion (size), and average second formant slope of vowel trajectories (F2 slope) which represents tongue movement speed. We compared vowel measures between ALS with and without clinically-evident bulbar motor disease (ALS+bulbar vs. ALS-bulbar), behavioral variant frontotemporal dementia (bvFTD) without a motor syndrome, and healthy controls (HC). We correlated impaired vowel measures with bulbar disease severity, estimated by clinical bulbar scores and perceived listener effort, and with MRI cortical thickness of the orobuccal part of the primary motor cortex innervating the tongue (oralPMC). We also tested correlations with respiratory capacity and cognitive impairment.

RESULTS: Participants were 45 ALS+bulbar (30 males, mean age=61±11), 22 ALS-nonbulbar (11 males, age=62±10), 22 bvFTD (13 males, age=63±7), and 34 HC (14 males, age=69±8). ALS+bulbar had smaller VSA and shallower average F2 slopes than ALS-bulbar (VSA: | d |=0.86, p =0.0088; F2 slope: | d |=0.98, p =0.0054), bvFTD (VSA: | d |=0.67, p =0.043; F2 slope: | d |=1.4, p <0.001), and HC (VSA: | d |=0.73, p =0.024; F2 slope: | d |=1.0, p <0.001). Vowel measures declined with worsening bulbar clinical scores (VSA: R=0.33, p =0.033; F2 slope: R=0.25, p =0.048), and smaller VSA was associated with greater listener effort (R=-0.43, p =0.041). Shallower F2 slopes were related to cortical thinning in oralPMC (R=0.50, p =0.03). Neither vowel measure was associated with respiratory nor cognitive test scores.

CONCLUSIONS: Vowel measures extracted with automatic processing from natural speech are sensitive to bulbar motor disease in ALS-FTD and are robust to cognitive impairment.

RevDate: 2023-07-21
CmpDate: 2023-07-21

Easwar V, Peng ZE, Mak V, et al (2023)

Differences between children and adults in the neural encoding of voice fundamental frequency in the presence of noise and reverberation.

The European journal of neuroscience, 58(2):2547-2562.

Environmental noise and reverberation challenge speech understanding more significantly in children than in adults. However, the neural/sensory basis for the difference is poorly understood. We evaluated the impact of noise and reverberation on the neural processing of the fundamental frequency of voice (f0)-an important cue to tag or recognize a speaker. In a group of 39 6- to 15-year-old children and 26 adults with normal hearing, envelope following responses (EFRs) were elicited by a male-spoken /i/ in quiet, noise, reverberation, and both noise and reverberation. Due to increased resolvability of harmonics at lower than higher vowel formants that may affect susceptibility to noise and/or reverberation, the /i/ was modified to elicit two EFRs: one initiated by the low frequency first formant (F1) and the other initiated by mid to high frequency second and higher formants (F2+) with predominantly resolved and unresolved harmonics, respectively. F1 EFRs were more susceptible to noise whereas F2+ EFRs were more susceptible to reverberation. Reverberation resulted in greater attenuation of F1 EFRs in adults than children, and greater attenuation of F2+ EFRs in older than younger children. Reduced modulation depth caused by reverberation and noise explained changes in F2+ EFRs but was not the primary determinant for F1 EFRs. Experimental data paralleled modelled EFRs, especially for F1. Together, data suggest that noise or reverberation influences the robustness of f0 encoding depending on the resolvability of vowel harmonics and that maturation of processing temporal/envelope information of voice is delayed in reverberation, particularly for low frequency stimuli.

RevDate: 2023-05-12

Wang Y, Hattori M, Masaki K, et al (2023)

Detailed speech evaluation including formant 3 analysis and voice visualization in maxillofacial rehabilitation: A clinical report.

The Journal of prosthetic dentistry pii:S0022-3913(23)00221-4 [Epub ahead of print].

Objective speech evaluation such as analysis of formants 1 and 2 and nasality measurement have been used in maxillofacial rehabilitation for outcome assessment. However, in some patients, those evaluations are insufficient to assess a specific or unique problem. This report describes the use of a new speech evaluation including formant 3 analysis and voice visualization in a patient with a maxillofacial defect. The patient was a 67-year-old man who had a maxillary defect that opened to the maxillary sinus and who had an unnatural voice even when wearing an obturator. Nasality was low and the frequency of formants 1 and 2 were normal even without the obturator. However, a low frequency of formant 3 and a shifted center of voice were observed. These results indicated that the unnatural voice was related to increased resonant volume in the pharynx rather than hypernasality. This patient demonstrates that advanced speech analysis can be useful for detecting the cause of speech disorder and planning maxillofacial rehabilitation.

RevDate: 2023-05-05

Cavalcanti JC, Eriksson A, PA Barbosa (2023)

On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style.

Frontiers in psychology, 14:1101187.

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetry concerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

RevDate: 2023-05-17
CmpDate: 2023-05-04

Zaltz Y (2023)

The effect of stimulus type and testing method on talker discrimination of school-age children.

The Journal of the Acoustical Society of America, 153(5):2611.

Efficient talker discrimination (TD) improves speech understanding under multi-talker conditions. So far, TD of children has been assessed using various testing parameters, making it difficult to draw comparative conclusions. This study explored the effects of the stimulus type and variability on children's TD. Thirty-two children (7-10 years old) underwent eight TD assessments with fundamental frequency + formant changes using an adaptive procedure. Stimuli included consonant-vowel-consonant words or three-word sentences and were either fixed by run or by trial (changing throughout the run). Cognitive skills were also assessed. Thirty-one adults (18-35 years old) served as controls. The results showed (1) poorer TD for the fixed-by-trial than the fixed-by-run method, with both stimulus types for the adults but only with the words for the children; (2) poorer TD for the words than the sentences with the fixed-by-trial method only for the children; and (3) significant correlations between the children's age and TD. These results support a developmental trajectory in the use of perceptual anchoring for TD and in its reliance on comprehensive acoustic and linguistic information. The finding that the testing parameters may influence the top-down and bottom-up processing for TD should be considered when comparing data across studies or when planning new TD experiments.

RevDate: 2023-05-04
CmpDate: 2023-05-03

Ghosh S, Feng Z, Bian J, et al (2022)

DR-VIDAL - Doubly Robust Variational Information-theoretic Deep Adversarial Learning for Counterfactual Prediction and Treatment Effect Estimation on Real World Data.

AMIA ... Annual Symposium proceedings. AMIA Symposium, 2022:485-494.

Determining causal effects of interventions onto outcomes from real-world, observational (non-randomized) data, e.g., treatment repurposing using electronic health records, is challenging due to underlying bias. Causal deep learning has improved over traditional techniques for estimating individualized treatment effects (ITE). We present the Doubly Robust Variational Information-theoretic Deep Adversarial Learning (DR-VIDAL), a novel generative framework that combines two joint models of treatment and outcome, ensuring an unbiased ITE estimation even when one of the two is misspecified. DR-VIDAL integrates: (i) a variational autoencoder (VAE) to factorize confounders into latent variables according to causal assumptions; (ii) an information-theoretic generative adversarial network (Info-GAN) to generate counterfactuals; (iii) a doubly robust block incorporating treatment propensities for outcome predictions. On synthetic and real-world datasets (Infant Health and Development Program, Twin Birth Registry, and National Supported Work Program), DR-VIDAL achieves better performance than other non-generative and generative methods. In conclusion, DR-VIDAL uniquely fuses causal assumptions, VAE, Info-GAN, and doubly robustness into a comprehensive, per- formant framework. Code is available at: https://github.com/Shantanu48114860/DR-VIDAL-AMIA-22 under MIT license.

RevDate: 2023-04-28

Li M, Erickson IM, Cross EV, et al (2023)

It's Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation.

Human factors [Epub ahead of print].

OBJECTIVE: The objective of this study was to estimate trust from conversations using both lexical and acoustic data.

BACKGROUND: As NASA moves to long-duration space exploration operations, the increasing need for cooperation between humans and virtual agents requires real-time trust estimation by virtual agents. Measuring trust through conversation is a novel and unintrusive approach.

METHOD: A 2 (reliability) × 2 (cycles) × 3 (events) within-subject study with habitat system maintenance was designed to elicit various levels of trust in a conversational agent. Participants had trust-related conversations with the conversational agent at the end of each decision-making task. To estimate trust, subjective trust ratings were predicted using machine learning models trained on three types of conversational features (i.e., lexical, acoustic, and combined). After training, model explanation was performed using variable importance and partial dependence plots.

RESULTS: Results showed that a random forest algorithm, trained using the combined lexical and acoustic features, predicted trust in the conversational agent most accurately (Radj2=0.71). The most important predictors were a combination of lexical and acoustic cues: average sentiment considering valence shifters, the mean of formants, and Mel-frequency cepstral coefficients (MFCC). These conversational features were identified as partial mediators predicting people's trust.

CONCLUSION: Precise trust estimation from conversation requires lexical cues and acoustic cues.

APPLICATION: These results showed the possibility of using conversational data to measure trust, and potentially other dynamic mental states, unobtrusively and dynamically.

RevDate: 2023-04-30

Teixeira FL, Costa MRE, Abreu JP, et al (2023)

A Narrative Review of Speech and EEG Features for Schizophrenia Detection: Progress and Challenges.

Bioengineering (Basel, Switzerland), 10(4):.

Schizophrenia is a mental illness that affects an estimated 21 million people worldwide. The literature establishes that electroencephalography (EEG) is a well-implemented means of studying and diagnosing mental disorders. However, it is known that speech and language provide unique and essential information about human thought. Semantic and emotional content, semantic coherence, syntactic structure, and complexity can thus be combined in a machine learning process to detect schizophrenia. Several studies show that early identification is crucial to prevent the onset of illness or mitigate possible complications. Therefore, it is necessary to identify disease-specific biomarkers for an early diagnosis support system. This work contributes to improving our knowledge about schizophrenia and the features that can identify this mental illness via speech and EEG. The emotional state is a specific characteristic of schizophrenia that can be identified with speech emotion analysis. The most used features of speech found in the literature review are fundamental frequency (F0), intensity/loudness (I), frequency formants (F1, F2, and F3), Mel-frequency cepstral coefficients (MFCC's), the duration of pauses and sentences (SD), and the duration of silence between words. Combining at least two feature categories achieved high accuracy in the schizophrenia classification. Prosodic and spectral or temporal features achieved the highest accuracy. The work with higher accuracy used the prosodic and spectral features QEVA, SDVV, and SSDL, which were derived from the F0 and spectrogram. The emotional state can be identified with most of the features previously mentioned (F0, I, F1, F2, F3, MFCCs, and SD), linear prediction cepstral coefficients (LPCC), linear spectral features (LSF), and the pause rate. Using the event-related potentials (ERP), the most promissory features found in the literature are mismatch negativity (MMN), P2, P3, P50, N1, and N2. The EEG features with higher accuracy in schizophrenia classification subjects are the nonlinear features, such as Cx, HFD, and Lya.

RevDate: 2023-07-18
CmpDate: 2023-07-10

Oganian Y, Bhaya-Grossman I, Johnson K, et al (2023)

Vowel and formant representation in the human auditory speech cortex.

Neuron, 111(13):2105-2118.e4.

Vowels, a fundamental component of human speech across all languages, are cued acoustically by formants, resonance frequencies of the vocal tract shape during speaking. An outstanding question in neurolinguistics is how formants are processed neurally during speech perception. To address this, we collected high-density intracranial recordings from the human speech cortex on the superior temporal gyrus (STG) while participants listened to continuous speech. We found that two-dimensional receptive fields based on the first two formants provided the best characterization of vowel sound representation. Neural activity at single sites was highly selective for zones in this formant space. Furthermore, formant tuning is adjusted dynamically for speaker-specific spectral context. However, the entire population of formant-encoding sites was required to accurately decode single vowels. Overall, our results reveal that complex acoustic tuning in the two-dimensional formant space underlies local vowel representations in STG. As a population code, this gives rise to phonological vowel perception.

RevDate: 2023-04-20

Herbst CT, Story BH, D Meyer (2023)

Acoustical Theory of Vowel Modification Strategies in Belting.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00004-8 [Epub ahead of print].

Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.

RevDate: 2023-04-20

Dillon MT, Helpard L, Brown KD, et al (2023)

Influence of the Frequency-to-Place Function on Recognition with Place-Based Cochlear Implant Maps.

The Laryngoscope [Epub ahead of print].

OBJECTIVE: Comparison of acute speech recognition for cochlear implant (CI) alone and electric-acoustic stimulation (EAS) users listening with default maps or place-based maps using either a spiral ganglion (SG) or a new Synchrotron Radiation-Artificial Intelligence (SR-AI) frequency-to-place function.

METHODS: Thirteen adult CI-alone or EAS users completed a task of speech recognition at initial device activation with maps that differed in the electric filter frequency assignments. The three map conditions were: (1) maps with the default filter settings (default map), (2) place-based maps with filters aligned to cochlear SG tonotopicity using the SG function (SG place-based map), and (3) place-based maps with filters aligned to cochlear Organ of Corti (OC) tonotopicity using the SR-AI function (SR-AI place-based map). Speech recognition was evaluated using a vowel recognition task. Performance was scored as the percent correct for formant 1 recognition due to the rationale that the maps would deviate the most in the estimated cochlear place frequency for low frequencies.

RESULTS: On average, participants had better performance with the OC SR-AI place-based map as compared to the SG place-based map and the default map. A larger performance benefit was observed for EAS users than for CI-alone users.

CONCLUSION: These pilot data suggest that EAS and CI-alone users may experience better performance with a patient-centered mapping approach that accounts for the variability in cochlear morphology (OC SR-AI frequency-to-place function) in the individualization of the electric filter frequencies (place-based mapping procedure).

LEVEL OF EVIDENCE: 3 Laryngoscope, 2023.

RevDate: 2023-06-13
CmpDate: 2023-05-11

Terband H, F van Brenk (2023)

Modeling Responses to Auditory Feedback Perturbations in Adults, Children, and Children With Complex Speech Sound Disorders: Evidence for Impaired Auditory Self-Monitoring?.

Journal of speech, language, and hearing research : JSLHR, 66(5):1563-1587.

PURPOSE: Previous studies have found that typically developing (TD) children were able to compensate and adapt to auditory feedback perturbations to a similar or larger degree compared to young adults, while children with speech sound disorder (SSD) were found to produce predominantly following responses. However, large individual differences lie underneath the group-level results. This study investigates possible mechanisms in responses to formant shifts by modeling parameters of feedback and feedforward control of speech production based on behavioral data.

METHOD: SimpleDIVA was used to model an existing dataset of compensation/adaptation behavior to auditory feedback perturbations collected from three groups of Dutch speakers: 50 young adults, twenty-three 4- to 8-year-old children with TD speech, and seven 4- to 8-year-old children with SSD. Between-groups and individual within-group differences in model outcome measures representing auditory and somatosensory feedback control gain and feedforward learning rate were assessed.

RESULTS: Notable between-groups and within-group variation was found for all outcome measures. Data modeled for individual speakers yielded model fits with varying reliability. Auditory feedback control gain was negative in children with SSD and positive in both other groups. Somatosensory feedback control gain was negative for both groups of children and marginally negative for adults. Feedforward learning rate measures were highest in the children with TD speech followed by children with SSD, compared to adults.

CONCLUSIONS: The SimpleDIVA model was able to account for responses to the perturbation of auditory feedback other than corrective, as negative auditory feedback control gains were associated with following responses to vowel shifts. These preliminary findings are suggestive of impaired auditory self-monitoring in children with complex SSD. Possible mechanisms underlying the nature of following responses are discussed.

RevDate: 2023-06-13
CmpDate: 2023-05-11

Chao SC, A Daliri (2023)

Effects of Gradual and Sudden Introduction of Perturbations on Adaptive Responses to Formant-Shift and Formant-Clamp Perturbations.

Journal of speech, language, and hearing research : JSLHR, 66(5):1588-1599.

PURPOSE: When the speech motor system encounters errors, it generates adaptive responses to compensate for the errors. Unlike errors induced by formant-shift perturbations, errors induced by formant-clamp perturbations do not correspond with the speaker's speech (i.e., degraded motor-to-auditory correspondence). We previously showed that adaptive responses to formant-clamp perturbations are smaller than responses to formant-shift perturbations when perturbations are introduced gradually. This study examined responses to formant-clamp and formant-shift perturbations when perturbations are introduced suddenly.

METHOD: One group of participants (n = 30) experienced gradually introduced formant-clamp and formant-shift perturbations, and another group (n = 30) experienced suddenly introduced formant-clamp and formant-shift perturbations. We designed the perturbations based on participant-specific vowel configurations such that a participant's first and second formants of /ɛ/ were perturbed toward their /æ/. To estimate adaptive responses, we measured formant changes (0-100 ms of the vowel) in response to the formant perturbations.

RESULTS: We found that (a) the difference between responses to formant-clamp and formant-shift perturbations was smaller when the perturbations were introduced suddenly and (b) responses to suddenly introduced (but not gradually introduced) formant-shift perturbations positively correlated with responses to formant-clamp perturbations.

CONCLUSIONS: These results showed that the speech motor system responds to errors induced by formant-shift and formant-clamp perturbations more differently when perturbations are introduced gradually than suddenly. Overall, the quality of errors (formant-shift vs. formant-clamp) and the manner of introducing errors (gradually vs. suddenly) modulate the speech motor system's evaluations of and responses to errors.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.22406422.

RevDate: 2023-06-13
CmpDate: 2023-05-11

Luo X, A Daliri (2023)

The Impact of Bimodal Hearing on Speech Acoustics of Vowel Production in Adult Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR, 66(5):1511-1524.

PURPOSE: This study aimed to investigate the acoustic changes in vowel production with different forms of auditory feedback via cochlear implant (CI), hearing aid (HA), and bimodal hearing (CI + HA).

METHOD: Ten post-lingually deaf adult bimodal CI users (aged 50-78 years) produced English vowels /i/, /ɛ/, /æ/, /ɑ/, /ʊ/, and /u/ in the context of /hVd/ during short-term use of no device (ND), HA, CI, and CI + HA. Segmental features (first formant frequency [F 1], second formant frequency [F 2], and vowel space area) and suprasegmental features (duration, intensity, and fundamental frequency [f o]) of vowel production were analyzed. Participants also categorized a vowel continuum synthesized from their own productions of /ɛ/ and /æ/ using HA, CI, and CI + HA.

RESULTS: F 1s of all vowels decreased; F 2s of front vowels but not back vowels increased; vowel space areas increased; and vowel durations, intensities, and f os decreased with statistical significance in the HA, CI, and CI + HA conditions relative to the ND condition. Only f os were lower, and vowel space areas were larger with CI and CI + HA than with HA. Average changes in f o, intensity, and F 1 from the ND condition to the HA, CI, and CI + HA conditions were positively correlated. Most participants did not show a typical psychometric function for vowel categorization, and thus, the relationship between vowel categorization and production was not tested.

CONCLUSIONS: The results suggest that acoustic, electric, and bimodal hearing have a measurable impact on vowel acoustics of post-lingually deaf adults when their hearing devices are turned on and off temporarily. Also, changes in f o and F 1 with the use of hearing devices may be largely driven by changes in intensity.

RevDate: 2023-04-12
CmpDate: 2023-04-10

Hsu TC, Wu BX, Lin RT, et al (2023)

Electron-phonon interaction toward engineering carrier mobility of periodic edge structured graphene nanoribbons.

Scientific reports, 13(1):5781.

Graphene nanoribbons have many extraordinary electrical properties and are the candidates for semiconductor industry. In this research, we propose a design of Coved GNRs with periodic structure ranged from 4 to 8 nm or more, of which the size is within practical feature sizes by advanced lithography tools. The carrier transport properties of Coved GNRs with the periodic coved shape are designed to break the localized electronic state and reducing electron-phonon scattering. In this way, the mobility of Coved GNRs can be enhanced by orders compared with the zigzag GNRs in same width. Moreover, in contrast to occasional zero bandgap transition of armchair and zigzag GNRs without precision control in atomic level, the Coved GNRs with periodic edge structures can exclude the zero bandgap conditions, which makes practical the mass production process. The designed Coved-GNRs is fabricated over the Germanium (110) substrate where the graphene can be prepared in the single-crystalline and single-oriented formants and the edge of GNRs is later repaired under "balanced condition growth" and we demonstrate that the propose coved structures are compatible to current fabrication facility.

RevDate: 2023-06-13
CmpDate: 2023-04-14

Vorperian HK, Kent RD, Lee Y, et al (2023)

Vowel Production in Children and Adults With Down Syndrome: Fundamental and Formant Frequencies of the Corner Vowels.

Journal of speech, language, and hearing research : JSLHR, 66(4):1208-1239.

PURPOSE: Atypical vowel production contributes to reduced speech intelligibility in children and adults with Down syndrome (DS). This study compares the acoustic data of the corner vowels /i/, /u/, /æ/, and /ɑ/ from speakers with DS against typically developing/developed (TD) speakers.

METHOD: Measurements of the fundamental frequency (f o) and first four formant frequencies (F1-F4) were obtained from single word recordings containing the target vowels from 81 participants with DS (ages 3-54 years) and 293 TD speakers (ages 4-92 years), all native speakers of English. The data were used to construct developmental trajectories and to determine interspeaker and intraspeaker variability.

RESULTS: Trajectories for DS differed from TD based on age and sex, but the groups were similar with the striking change in f o and F1-F4 frequencies around age 10 years. Findings confirm higher f o in DS, and vowel-specific differences between DS and TD in F1 and F2 frequencies, but not F3 and F4. The measure of F2 differences of front-versus-back vowels was more sensitive of compression than reduced vowel space area/centralization across age and sex. Low vowels had more pronounced F2 compression as related to reduced speech intelligibility. Intraspeaker variability was significantly greater for DS than TD for nearly all frequency values across age.

DISCUSSION: Vowel production differences between DS and TD are age- and sex-specific, which helps explain contradictory results in previous studies. Increased intraspeaker variability across age in DS confirms the presence of a persisting motor speech disorder. Atypical vowel production in DS is common and related to dysmorphology, delayed development, and disordered motor control.

RevDate: 2023-04-02

Capobianco S, Nacci A, Calcinoni O, et al (2023)

Assessing Acoustic Parameters in Early Music and Romantic Operatic Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00041-3 [Epub ahead of print].

OBJECTIVE: Since the recent early music (EM) revival, a subset of singers have begun to specialize in a style of singing that is perceptually different from the more "mainstream" romantic operatic (RO) singing style. The aim of this study is to characterize EM with respect to RO singing in terms of its vibrato characteristics and the singer's formant cluster.

STUDY DESIGN: This study presents a within-subject experimental design.

METHODS: Ten professional singers (5 F; 5M) versed in both EM and RO repertoire were enrolled in the study. Each singer recorded the first 10 bars of the famous Aria, "Amarilli Mia Bella" (Giulio Caccini, 1602) a cappella, in RO and EM styles, in random order. Three sustained notes were extracted from the acoustical recordings and were analyzed using the free user-friendly software Biovoice to extract five parameters: vibrato rate, vibrato extent, vibrato jitter (Jvib), vibrato shimmer, and quality ratio (QR), an estimation of the singer's formant power.

RESULTS: Vibrato in EM singing was characterized by a higher rate, a smaller extent, and less regular cycle-cycle period duration (higher Jvib) compared to RO singing. As in previous studies, RO singing presented a more prominent singer's formant, as indicated by a smaller QR.

CONCLUSIONS: Acoustical analysis of some vibrato characteristics and the Singer's Formant significantly differentiated EM from RO singing styles. Given the acoustical distinctions between EM and RO styles, future scientific and musicological studies should consider distinguishing between the two styles rather than using a singular term for and description of Western Classical singing.

RevDate: 2023-04-03
CmpDate: 2023-04-03

Wood S (2023)

Dating the open /æ/ sound change in Southern British English.

JASA express letters, 3(3):035205.

The new open /æ/ was not noticed in the non-regional received pronunciation (RP) accent of Southern British English until the 1980s. Dating to the 1950s or 1920s had been suggested, but the earliest known regional example was born in Kent in the 1860s. Formant data from archived recordings of 29 Southeastern speakers, born between the 1850s and 1960s, were studied using two methods: inspection of formant diagrams for closer /æ/, and modelling low vowels for open /æ/. The earliest RP speaker found with new open /æ/ was born in 1857, demonstrating that this type of sound change had started by the 1850s.

RevDate: 2023-05-17
CmpDate: 2023-04-04

Serrurier A, C Neuschaefer-Rube (2023)

Morphological and acoustic modeling of the vocal tract.

The Journal of the Acoustical Society of America, 153(3):1867.

In speech production, the anatomical morphology forms the substrate on which the speakers build their articulatory strategy to reach specific articulatory-acoustic goals. The aim of this study is to characterize morphological inter-speaker variability by building a shape model of the full vocal tract including hard and soft structures. Static magnetic resonance imaging data from 41 speakers articulating altogether 1947 phonemes were considered, and the midsagittal articulator contours were manually outlined. A phoneme-independent average-articulation representative of morphology was calculated as the speaker mean articulation. A principal component analysis-driven shape model was derived from average-articulations, leading to five morphological components, which explained 87% of the variance. Almost three-quarters of the variance was related to independent variations of the horizontal oral and vertical pharyngeal lengths, the latter capturing male-female differences. The three additional components captured shape variations related to head tilt and palate shape. Plane wave propagation acoustic simulations were run to characterize morphological components. A lengthening of 1 cm of the vocal tract in the vertical or horizontal directions led to a decrease in formant values of 7%-8%. Further analyses are required to analyze three-dimensional variability and to understand the morphological-acoustic relationships per phoneme. Average-articulations and model code are publicly available (https://github.com/tonioser/VTMorphologicalModel).

RevDate: 2023-03-23

Lou Q, Wang X, Chen Y, et al (2023)

Subjective and Objective Evaluation of Speech in Adult Patients With Repaired Cleft Palate.

The Journal of craniofacial surgery pii:00001665-990000000-00653 [Epub ahead of print].

OBJECTIVE: To explore the speech outcomes of adult patients with repaired cleft palate through subjective perception evaluation and objective acoustic analysis, and to compare the differences in pronunciation characteristics between speakers with complete velopharyngeal closure (VPC) and velopharyngeal insufficiency (VPI) patients.

PARTICIPANTS AND INTERVENTION: Subjective evaluation indicators included speech intelligibility, nasality and consonant missing rate, for objective acoustic analysis, we used speech sample normalization and objective acoustic parameters included normalized vowel formants, voice onset time and the analysis of 3-dimensional spectrogram and spectrum, were carried out on speech samples produced by 3 groups of speakers: (a) speakers with velopharyngeal competence after palatorrhaphy (n=38); (b) speakers with velopharyngeal incompetence after palatorrhaphy (n=70), (c) adult patients with cleft palate (n=65) and (d) typical speakers (n=30).

RESULTS: There was a highly negative correlation between VPC grade and speech intelligibility (ρ=-0.933), and a highly positive correlation between VPC and nasality (ρ=0.813). In subjective evaluation, the speech level of VPI patients was significantly lower than that of VPC patients and normal adults. Although the nasality and consonant loss rate of VPC patients were significantly higher than that of normal adults, the speech intelligibility of VPC patients was not significantly different from that of normal adults. In acoustic analysis, patients with VPI still performed poorly compared with patients with VPC.

CONCLUSIONS: The speech function of adult cleft palate patients is affected by abnormal palatal structure and bad pronunciation habits. In subjective evaluation, there was no significant difference in speech level between VPC patients and normal adults, whereas there was significant difference between VPI patients and normal adults. The acoustic parameters were different between the 2 groups after cleft palate repair. The condition of palatopharyngeal closure after cleft palate can affect the patient's speech.

RevDate: 2023-05-24
CmpDate: 2023-03-23

Easwar V, Purcell D, T Wright (2023)

Predicting Hearing aid Benefit Using Speech-Evoked Envelope Following Responses in Children With Hearing Loss.

Trends in hearing, 27:23312165231151468.

Electroencephalography could serve as an objective tool to evaluate hearing aid benefit in infants who are developmentally unable to participate in hearing tests. We investigated whether speech-evoked envelope following responses (EFRs), a type of electroencephalography-based measure, could predict improved audibility with the use of a hearing aid in children with mild-to-severe permanent, mainly sensorineural, hearing loss. In 18 children, EFRs were elicited by six male-spoken band-limited phonemic stimuli--the first formants of /u/ and /i/, the second and higher formants of /u/ and /i/, and the fricatives /s/ and /∫/--presented together as /su∫i/. EFRs were recorded between the vertex and nape, when /su∫i/ was presented at 55, 65, and 75 dB SPL using insert earphones in unaided conditions and individually fit hearing aids in aided conditions. EFR amplitude and detectability improved with the use of a hearing aid, and the degree of improvement in EFR amplitude was dependent on the extent of change in behavioral thresholds between unaided and aided conditions. EFR detectability was primarily influenced by audibility; higher sensation level stimuli had an increased probability of detection. Overall EFR sensitivity in predicting audibility was significantly higher in aided (82.1%) than unaided conditions (66.5%) and did not vary as a function of stimulus or frequency. EFR specificity in ascertaining inaudibility was 90.8%. Aided improvement in EFR detectability was a significant predictor of hearing aid-facilitated change in speech discrimination accuracy. Results suggest that speech-evoked EFRs could be a useful objective tool in predicting hearing aid benefit in children with hearing loss.

RevDate: 2023-05-30
CmpDate: 2023-03-23

Duan H, Xie Q, Z Zhang (2023)

Characteristics of Alveolo-palatal Affricates Produced by Mandarin-speaking Children with Repaired Cleft Palate.

American journal of health behavior, 47(1):13-20.

Objectives: In this study, examined the acoustic properties of affricates /t/ and /t[h]/ in Mandarin Chinese, and analyzed the differences of the acoustic characteristics of these affricates produced by children with repaired cleft palate and normally developing children. We also explored the relationship between the affricates and high-front vowel /i/. Methods: We analyzed 16 monosyllabic words with alveolo-palatal affricates as the initial consonants produced by children with repaired cleft palate (N=13, Mean=5.9 years) and normally developing children (N=6, Mean age=5.3 years). We used several acoustic parameters to investigate the characteristics of these affricates, such as the center of gravity, VOT and the formants of vowels. Results: Compared with normally developing children, children with cleft palate exhibited a lower center of gravity for the 2 affricates /t/ and /t[h]/. Data from the control group showed that the affricate /t[h]/ had a significantly greater center of gravity than that of /t/. The accuracy of /t , t[h]/ produced by speakers of cleft palate was significantly correlated with that of /i/ (r=0.63). High-front vowel /i/ is a significant index in diagnosing speech intelligibility which is more valuable than /a/ and /u/. There was a significant difference in F2 of vowel /i/ between children with cleft palate without speech therapy (CS1) and after speech therapy (CS2). After speech intervention, the accuracy of affricates produced by children with cleft palate was improved, the acoustic properties "stop + noise segments" appeared. Conclusion: Children with cleft palate can be distinguished better from children with normal development by 2 significant acoustic characteristics: center of gravity and VOT. As alveolo-palatal affricates /t , t[h]/ and high-front vowel /i/ have a similar place of articulation, front-tongue-blade, their production accuracy can be improved mutually. The analysis showed that the articulation of Chinese /i/ has a higher frontal lingual position and less variability, which is more conducive to articulation training and improves the effect of cleft palate training. These findings provide a potential relationship on affricates /t, t[h]/ and vowel /i/. Children with cleft palate have difficulty pronouncing the /t, t [h]/ and /i/. It is better to start with a vowel /i/, resulting in improvement in overall speech intelligibility.

RevDate: 2023-03-22

Alghowinem S, Gedeon T, Goecke R, et al (2023)

Interpretation of Depression Detection Models via Feature Selection Methods.

IEEE transactions on affective computing, 14(1):133-152.

Given the prevalence of depression worldwide and its major impact on society, several studies employed artificial intelligence modelling to automatically detect and assess depression. However, interpretation of these models and cues are rarely discussed in detail in the AI community, but have received increased attention lately. In this study, we aim to analyse the commonly selected features using a proposed framework of several feature selection methods and their effect on the classification results, which will provide an interpretation of the depression detection model. The developed framework aggregates and selects the most promising features for modelling depression detection from 38 feature selection algorithms of different categories. Using three real-world depression datasets, 902 behavioural cues were extracted from speech behaviour, speech prosody, eye movement and head pose. To verify the generalisability of the proposed framework, we applied the entire process to depression datasets individually and when combined. The results from the proposed framework showed that speech behaviour features (e.g. pauses) are the most distinctive features of the depression detection model. From the speech prosody modality, the strongest feature groups were F0, HNR, formants, and MFCC, while for the eye activity modality they were left-right eye movement and gaze direction, and for the head modality it was yaw head movement. Modelling depression detection using the selected features (even though there are only 9 features) outperformed using all features in all the individual and combined datasets. Our feature selection framework did not only provide an interpretation of the model, but was also able to produce a higher accuracy of depression detection with a small number of features in varied datasets. This could help to reduce the processing time needed to extract features and creating the model.

RevDate: 2023-03-08

Hauser I (2023)

Differential Cue Weighting in Mandarin Sibilant Production.

Language and speech [Epub ahead of print].

Individual talkers vary in their relative use of different cues to signal phonological contrast. Previous work provides limited and conflicting data on whether such variation is modulated by cue trading or individual differences in speech style. This paper examines differential cue weighting patterns in Mandarin sibilants as a test case for these hypotheses. Standardized Mandarin exhibits a three-way place contrast between retroflex, alveopalatal, and alveolar sibilants with individual differences in relative weighting of spectral center of gravity (COG) and the second formant of the following vowel (F2). In results from a speech production task, cue weights of COG and F2 are inversely correlated across speakers, demonstrating a trade-off relationship in cue use. These findings are consistent with a cue trading account of individual differences in contrast signaling.

RevDate: 2023-04-17
CmpDate: 2023-04-07

Yang X, Guo C, Zhang M, et al (2023)

Ultrahigh-sensitivity multi-parameter tacrolimus solution detection based on an anchor planar millifluidic microwave biosensor.

Analytical methods : advancing methods and applications, 15(14):1765-1774.

To detect drug concentration in tacrolimus solution, an anchor planar millifluidic microwave (APMM) biosensor is proposed. The millifluidic system integrated with the sensor enables accurate and efficient detection while eliminating interference caused by the fluidity of the tacrolimus sample. Different concentrations (10-500 ng mL[-1]) of the tacrolimus analyte were introduced into the millifluidic channel, where it completely interacts with the radio frequency patch electromagnetic field, thereby effectively and sensitively modifying the resonant frequency and amplitude of the transmission coefficient. Experimental results indicate that the sensor has an extremely low limit of detection (LoD) of 0.12 pg mL[-1] and a frequency detection resolution (FDR) of 1.59 (MHz (ng mL[-1])). The greater the FDR and the lower the LoD, the more the feasibility of a label-free biosensing method. Regression analysis revealed a strong linear correlation (R[2] = 0.992) between the concentration of tacrolimus and the frequency difference of the two resonant peaks of APMM. In addition, the difference in the reflection coefficient between the two formants was measured and calculated, and a strong linear correlation (R[2] = 0.998) was found between the difference and tacrolimus concentration. Five measurements were performed on each individual sample of tacrolimus to validate the biosensor's high repeatability. Consequently, the proposed biosensor is a potential candidate for the early detection of tacrolimus drug concentration levels in organ transplant recipients. This study presents a simple method for constructing microwave biosensors with high sensitivity and rapid response.

RevDate: 2023-04-01
CmpDate: 2023-03-03

Liu Z, Y Xu (2023)

Deep learning assessment of syllable affiliation of intervocalic consonants.

The Journal of the Acoustical Society of America, 153(2):848.

In English, a sentence like "He made out our intentions." could be misperceived as "He may doubt our intentions." because the coda /d/ sounds like it has become the onset of the next syllable. The nature and occurrence condition of this resyllabification phenomenon are unclear, however. Previous empirical studies mainly relied on listener judgment, limited acoustic evidence, such as voice onset time, or average formant values to determine the occurrence of resyllabification. This study tested the hypothesis that resyllabification is a coarticulatory reorganisation that realigns the coda consonant with the vowel of the next syllable. Deep learning in conjunction with dynamic time warping (DTW) was used to assess syllable affiliation of intervocalic consonants. The results suggest that convolutional neural network- and recurrent neural network-based models can detect cases of resyllabification using Mel-frequency spectrograms. DTW analysis shows that neural network inferred resyllabified sequences are acoustically more similar to their onset counterparts than their canonical productions. A binary classifier further suggests that, similar to the genuine onsets, the inferred resyllabified coda consonants are coarticulated with the following vowel. These results are interpreted with an account of resyllabification as a speech-rate-dependent coarticulatory reorganisation mechanism in speech.

RevDate: 2023-04-01
CmpDate: 2023-03-03

Lasota M, Šidlof P, Maurerlehner P, et al (2023)

Anisotropic minimum dissipation subgrid-scale model in hybrid aeroacoustic simulations of human phonation.

The Journal of the Acoustical Society of America, 153(2):1052.

This article deals with large-eddy simulations of three-dimensional incompressible laryngeal flow followed by acoustic simulations of human phonation of five cardinal English vowels, /ɑ, æ, i, o, u/. The flow and aeroacoustic simulations were performed in OpenFOAM and in-house code openCFS, respectively. Given the large variety of scales in the flow and acoustics, the simulation is separated into two steps: (1) computing the flow in the larynx using the finite volume method on a fine moving grid with 2.2 million elements, followed by (2) computing the sound sources separately and wave propagation to the radiation zone around the mouth using the finite element method on a coarse static grid with 33 000 elements. The numerical results showed that the anisotropic minimum dissipation model, which is not well known since it is not available in common CFD software, predicted stronger sound pressure levels at higher harmonics, and especially at first two formants, than the wall-adapting local eddy-viscosity model. The model on turbulent flow in the larynx was employed and a positive impact on the quality of simulated vowels was found.

RevDate: 2023-03-30
CmpDate: 2023-03-28

Huang Z, Lobbezoo F, Vanhommerig JW, et al (2023)

Effects of demographic and sleep-related factors on snoring sound parameters.

Sleep medicine, 104:3-10.

OBJECTIVE: To investigate the effect of frequently reported between-individual (viz., age, gender, body mass index [BMI], and apnea-hypopnea index [AHI]) and within-individual (viz., sleep stage and sleep position) snoring sound-related factors on snoring sound parameters in temporal, intensity, and frequency domains.

METHODS: This study included 83 adult snorers (mean ± SD age: 42.2 ± 11.3 yrs; male gender: 59%) who underwent an overnight polysomnography (PSG) and simultaneous sound recording, from which a total of 131,745 snoring events were extracted and analyzed. Data on both between-individual and within-individual factors were extracted from the participants' PSG reports.

RESULTS: Gender did not have any significant effect on snoring sound parameters. The fundamental frequency (FF; coefficient = -0.31; P = 0.02) and dominant frequency (DF; coefficient = -12.43; P < 0.01) of snoring sounds decreased with the increase of age, and the second formant increased (coefficient = 22.91; P = 0.02) with the increase of BMI. Severe obstructive sleep apnea (OSA; AHI ≥30 events/hour), non-rapid eye movement sleep stage 3 (N3), and supine position were all associated with more, longer, and louder snoring events (P < 0.05). Supine position was associated with higher FF and DF, and lateral decubitus positions were associated with higher formants.

CONCLUSIONS: Within the limitations of the current patient profile and included factors, AHI was found to have greater effects on snoring sound parameters than the other between-individual factors. The included within-individual factors were found to have greater effects on snoring sound parameters than the between-individual factors under study.

RevDate: 2023-03-07
CmpDate: 2023-02-28

Wang L, Z Jiang (2023)

Tidal Volume Level Estimation Using Respiratory Sounds.

Journal of healthcare engineering, 2023:4994668.

Respiratory sounds have been used as a noninvasive and convenient method to estimate respiratory flow and tidal volume. However, current methods need calibration, making them difficult to use in a home environment. A respiratory sound analysis method is proposed to estimate tidal volume levels during sleep qualitatively. Respiratory sounds are filtered and segmented into one-minute clips, all clips are clustered into three categories: normal breathing/snoring/uncertain with agglomerative hierarchical clustering (AHC). Formant parameters are extracted to classify snoring clips into simple snoring and obstructive snoring with the K-means algorithm. For simple snoring clips, the tidal volume level is calculated based on snoring last time. For obstructive snoring clips, the tidal volume level is calculated by the maximum breathing pause interval. The performance of the proposed method is evaluated on an open dataset, PSG-Audio, in which full-night polysomnography (PSG) and tracheal sound were recorded simultaneously. The calculated tidal volume levels are compared with the corresponding lowest nocturnal oxygen saturation (LoO2) data. Experiments show that the proposed method calculates tidal volume levels with high accuracy and robustness.

RevDate: 2023-02-24

Aldamen H, M Al-Deaibes (2023)

Arabic emphatic consonants as produced by English speakers: An acoustic study.

Heliyon, 9(2):e13401.

This study examines the production of emphatic consonants as produced by American L2 learners of Arabic. To this end, 19 participants, 5 native speakers and 14 L2 learners, participated in a production experiment in which they produced monosyllabic CVC pairs that were contrasted in terms of whether the initial consonant was plain or emphatic. The acoustic parameters that were investigated are VOT of voiceless stops, COG of fricatives, and the first three formant frequencies of the target vowels. The results of the native speakers showed that VOT is a reliable acoustic correlate of emphasis in MSA. The results also showed that vowels in the emphatic context have higher F1 and F3 and lower F2. The results showed that the L2 learners produced comparable VOT values to those of native Arabic speakers. Further, L2 learners produced a significantly lower F2 of the vowels in the emphatic context than that in the plain context. Proficiency in Arabic played a role on the F2 measure; the intermediate learners tended to be more native-like than the beginning learners. As for F3, the results of the L2 learners unexpectedly showed that the beginning learners produced a higher F3 in the context of fricatives only. This suggests that the relationship between emphasis and proficiency depends on whether the preceding consonant is a stop or fricative.

RevDate: 2023-02-24

Ali IE, Sumita Y, N Wakabayashi (2023)

Comparison of Praat and Computerized Speech Lab for formant analysis of five Japanese vowels in maxillectomy patients.

Frontiers in neuroscience, 17:1098197.

INTRODUCTION: Speech impairment is a common complication after surgical resection of maxillary tumors. Maxillofacial prosthodontists play a critical role in restoring this function so that affected patients can enjoy better lives. For that purpose, several acoustic software packages have been used for speech evaluation, among which Computerized Speech Lab (CSL) and Praat are widely used in clinical and research contexts. Although CSL is a commercial product, Praat is freely available on the internet and can be used by patients and clinicians to practice several therapy goals. Therefore, this study aimed to determine if both software produced comparable results for the first two formant frequencies (F1 and F2) and their respective formant ranges obtained from the same voice samples from Japanese participants with maxillectomy defects.

METHODS: CSL was used as a reference to evaluate the accuracy of Praat with both the default and newly proposed adjusted settings. Thirty-seven participants were enrolled in this study for formant analysis of the five Japanese vowels (a/i/u/e/o) using CSL and Praat. Spearman's rank correlation coefficient was used to judge the correlation between the analysis results of both programs regarding F1 and F2 and their respective formant ranges.

RESULTS: As the findings pointed out, highly positive correlations between both software were found for all acoustic features and all Praat settings.

DISCUSSION: The strong correlations between the results of both CSL and Praat suggest that both programs may have similar decision strategies for atypical speech and for both sexes. This study highlights that the default settings in Praat can be used for formant analysis in maxillectomy patients with predictable accuracy. The proposed adjusted settings in Praat can yield more accurate results for formant analysis of atypical speech in maxillectomy cases when the examiner cannot precisely locate the formant frequencies using the default settings or confirm analysis results obtained using CSL.

RevDate: 2023-02-08
CmpDate: 2023-02-08

Zhang C, Hou Q, Guo TT, et al (2023)

[The effect of Wendler Glottoplasty to elevate vocal pitch in transgender women].

Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery, 58(2):139-144.

Objective: To evaluate the effect of Wendler Glottoplasty to elevate vocal pitch in transgender women. Methods: The voice parameters of pre-and 3-month post-surgery of 29 transgender women who underwent Wendler Glottoplasty in department of otorhinolaryngology head and neck surgery of Beijing Friendship Hospital from January, 2017 to October, 2020 were retrospectively analyzed. The 29 transgender women ranged in age from 19-47 (27.0±6.3) years old. Subjective evaluation was performed using Transsexual Voice Questionnaire for Male to Female (TVQ[MtF]). Objective parameters included fundamental frequency (F0), highest pitch, lowest pitch, habitual volume, Jitter, Shimmer, maximal phonation time (MPT), noise to harmonic ratio (NHR) and formants frequencies(F1, F2, F3, F4). SPSS 25.0 software was used for statistically analysis. Results: Three months after surgery, the score of TVQ[MtF] was significantly decreased [(89.9±14.7) vs. (50.4±13.6), t=11.49, P<0.001]. The F0 was significantly elevated [(152.7±23.3) Hz vs. (207.7±45.9) Hz, t=-6.03, P<0.001]. Frequencies of F1, F2 and F3 were significantly elevated. No statistical difference was observed in the frequencies of F4. The highest pitch was not significantly altered while the lowest pitch was significantly elevated [(96.8±17.7) Hz vs. (120.0±28.9) Hz, t=-3.71, P=0.001]. Habitual speech volume was significantly increased [(60.0±5.2) dB vs. (63.6±9.6) dB, t=-2.12, P=0.043]. Jitter, Shimmer, NHR and MPT were not obviously altered (P>0.05). Conclusions: Wendler Glottoplasty could notably elevate the vocal pitch, formants frequencies and degree of vocal femininity in transgender women without affecting phonation ability and voice quality. It can be an effective treatment modality for voice feminization.

RevDate: 2023-02-07

Gunjawate DR, Ravi R, Tauro JP, et al (2022)

Spectral and Temporal Characteristics of Vowels in Konkani.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 74(Suppl 3):4870-4879.

The present study was undertaken to study the acoustic characteristics of vowels using spectrographic analysis in Mangalorean Catholic Konkani dialect of Konkani spoken in Mangalore, Karnataka, India. Recordings were done using CVC words in 11 males and 19 females between the age range of 18-55 years. The CVC words consisted of combinations of vowels such as (/i, i:, e, ɵ, ə, u, o, ɐ, ӓ, ɔ/) and consonants such as (/m, k, w, s, ʅ, h, l, r, p, ʤ, g, n, Ɵ, ṭ, ḷ, b, dh/). Recordings were done in a sound-treated room using PRAAT software and spectrographic analysis was done and spectral and temporal characteristics such as fundamental frequency (F0), formants (F1, F2, F3) and vowel duration. The results showed that higher fundamental frequency values were observed for short, high and back vowels. Higher F1 values were noted for open vowels and F2 was higher for front vowels. Long vowels had longer duration compared to short vowels and females had longer vowel duration compared to males. The acoustic information in terms of spectral and temporal cues helps in better understanding the production and perception of languages and dialects.

RevDate: 2023-02-07

Prakash P, Boominathan P, S Mahalingam (2022)

Acoustic Description of Bhramari Pranayama.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 74(Suppl 3):4738-4747.

UNLABELLED: The study's aim was (1) To describe the acoustic characteristics of Bhramari pranayama, and (2) to compare the acoustic features of nasal consonant /m/ and the sound of Bhramari pranayama produced by yoga trainers. Cross-sectional study design. Thirty-three adult male yoga trainers performed five repeats of nasal consonant /m/ and Bhramari pranayama. These samples were recorded into Computerized Speech Lab, Kay Pentax model 4500b using a microphone (SM48). Formant frequencies (f F1, f F2, f F3, & f F4), formant bandwidths (BF1, BF2, BF3, & BF4), anti-formant, alpha and beta ratio were analyzed. Nasal consonant /m/ had higher f F2 and anti-formant compared to Bhramari pranayama. Statistical significant differences were noted in f F2, BF3, and anti-formants. Bhramari pranayama revealed a low alpha ratio and a higher beta ratio than /m/. However, these differences were not statistically significant. Findings are discussed from acoustic and physiological perspectives. Bhramari pranayama was assumed to be produced with a larger pharyngeal cavity and narrower velar passage when compared to nasal consonant /m/. Verification at the level of the glottis and with aerodynamic parameters may ascertain the above propositions.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12070-021-03054-1.

RevDate: 2023-02-15
CmpDate: 2023-02-06

Kondaurova MV, Zheng Q, Donaldson CW, et al (2023)

Effect of telepractice on pediatric cochlear implant users and provider vowel space: A preliminary report.

The Journal of the Acoustical Society of America, 153(1):467.

Clear speaking styles are goal-oriented modifications in which talkers adapt acoustic-phonetic characteristics of speech to compensate for communication challenges. Do children with hearing loss and a clinical provider modify speech characteristics during telepractice to adjust for remote communication? The study examined the effect of telepractice (tele-) on vowel production in seven (mean age 4:11 years, SD 1:2 years) children with cochlear implants (CIs) and a provider. The first (F1) and second (F2) formant frequencies of /i/, /ɑ/, and /u/ vowels were measured in child and provider speech during one in-person and one tele-speech-language intervention, order counterbalanced. Child and provider vowel space areas (VSA) were calculated. The results demonstrated an increase in F2 formant frequency for /i/ vowel in child and provider speech and an increase in F1 formant frequency for /ɑ/ vowel in the provider speech during tele- compared to in-person intervention. An expansion of VSA was found in child and provider speech in tele- compared to in-person intervention. In children, the earlier age of CI activation was associated with larger VSA in both tele- and in-person intervention. The results suggest that the children and the provider adjust vowel articulation in response to remote communication during telepractice.

RevDate: 2023-05-02
CmpDate: 2023-04-04

Kirby J, Pittayaporn P, M Brunelle (2022)

Transphonologization of onset voicing: revisiting Northern and Eastern Kmhmu'.

Phonetica, 79(6):591-629.

Phonation and vowel quality are often thought to play a vital role at the initial stage of tonogenesis. This paper investigates the production of voicing and tones in a tonal Northern Kmhmu' dialect spoken in Nan Province, Thailand, and a non-tonal Eastern Kmhmu' dialect spoken in Vientiane, Laos, from both acoustic and electroglottographic perspectives. Large and consistent VOT differences between voiced and voiceless stops are preserved in Eastern Kmhmu', but are not found in Northern Kmhmu', consistent with previous reports. With respect to pitch, f0 is clearly a secondary property of the voicing contrast in Eastern Kmhmu', but unquestionably the primary contrastive property in Northern Kmhmu'. Crucially, no evidence is found to suggest that either phonation type or formant differences act as significant cues to voicing in Eastern Kmhmu' or tones in Northern Kmhmu'. These results suggests that voicing contrasts can also be transphonologized directly into f0-based contrasts, skipping a registral stage based primarily on phonation and/or vowel quality.

RevDate: 2023-02-02

Viegas F, Camargo Z, Viegas D, et al (2023)

Acoustic Measurements of Speech and Voice in Men with Angle Class II, Division 1, Malocclusion.

International archives of otorhinolaryngology, 27(1):e10-e15.

Introduction The acoustic analysis of speech (measurements of the fundamental frequency and formant frequencies) of different vowels produced by speakers with the Angle class II, division 1, malocclusion can provide information about the relationship between articulatory and phonatory mechanisms in this type of maxillomandibular disproportion. Objectives To investigate acoustic measurements related to the fundamental frequency (F0) and formant frequencies (F1 and F2) of the oral vowels of Brazilian Portuguese (BP) produced by male speakers with Angle class II, division 1, malocclusion (study group) and compare with men with Angle class I malocclusion (control group). Methods In total, 60 men (20 with class II, 40 with class I) aged between 18 and 40 years were included in the study. Measurements of F0, F1 and F2 of the seven oral vowels of BP were estimated from the audio samples containing repetitions of carrier sentences. The statistical analysis was performed using the Student t -test and the effect size was calculated. Results Significant differences (p -values) were detected for F0 values in five vowels ([e], [i], [ᴐ], [o] and [u]), and for F1 in vowels [a] and [ᴐ], with high levels for class II, division 1. Conclusion Statistical differences were found in the F0 measurements with higher values in five of the seven vowels analysed in subjects with Angle class II, division 1. The formant frequencies showed differences only in F1 in two vowels with higher values in the study group. The data suggest that data on voice and speech production must be included in the protocol's assessment of patients with malocclusion.

RevDate: 2023-02-02

Freeman V (2023)

Production and perception of prevelar merger: Two-dimensional comparisons using Pillai scores and confusion matrices.

Journal of phonetics, 97:.

Vowel merger production is quantified with gradient acoustic measures, while phonemic perception methods are often coarser, complicating comparisons within mergers in progress. This study implements a perception experiment in two-dimensional formant space (F1 × F2), allowing unified plotting, quantification, and statistics with production data. Production and perception are compared within 20 speakers for a two-part prevelar merger in progress in Pacific Northwest English, where mid-front /ɛ, e/ approximate or merge before voiced velar /ɡ/ (leg-vague merger), and low-front prevelar /æɡ/ raises toward them (bag-raising). Distributions are visualized with kernel density plots and overlap quantified with Pillai scores and confusion matrices from linear discriminant analysis models. Results suggest that leg-vague merger is perceived as more complete than it is produced (in both the sample and community), while bag-raising is highly variable in production but rejected in perception. Relationships between production and perception varied by age, with raising and merger progressing across two generations in production but not perception, followed by younger adults perceiving leg-vague merger but not producing it and varying in (minimal) raising perception while varying in bag-raising in production. Thus, prevelar raising/merger may be progressing among some social groups but reversing in others.

RevDate: 2023-03-15
CmpDate: 2023-02-14

Holmes E, IS Johnsrude (2023)

Intelligibility benefit for familiar voices is not accompanied by better discrimination of fundamental frequency or vocal tract length.

Hearing research, 429:108704.

Speech is more intelligible when it is spoken by familiar than unfamiliar people. If this benefit arises because key voice characteristics like perceptual correlates of fundamental frequency or vocal tract length (VTL) are more accurately represented for familiar voices, listeners may be able to discriminate smaller manipulations to such characteristics for familiar than unfamiliar voices. We measured participants' (N = 17) thresholds for discriminating pitch (correlate of fundamental frequency, or glottal pulse rate) and formant spacing (correlate of VTL; 'VTL-timbre') for voices that were familiar (participants' friends) and unfamiliar (other participants' friends). As expected, familiar voices were more intelligible. However, discrimination thresholds were no smaller for the same familiar voices. The size of the intelligibility benefit for a familiar over an unfamiliar voice did not relate to the difference in discrimination thresholds for the same voices. Also, the familiar-voice intelligibility benefit was just as large following perceptible manipulations to pitch and VTL-timbre. These results are more consistent with cognitive accounts of speech perception than traditional accounts that predict better discrimination.

RevDate: 2023-04-27

Ettore E, Müller P, Hinze J, et al (2023)

Digital Phenotyping for Differential Diagnosis of Major Depressive Episode: Narrative Review.

JMIR mental health, 10:e37225.

BACKGROUND: Major depressive episode (MDE) is a common clinical syndrome. It can be found in different pathologies such as major depressive disorder (MDD), bipolar disorder (BD), posttraumatic stress disorder (PTSD), or even occur in the context of psychological trauma. However, only 1 syndrome is described in international classifications (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition [DSM-5]/International Classification of Diseases 11th Revision [ICD-11]), which do not take into account the underlying pathology at the origin of the MDE. Clinical interviews are currently the best source of information to obtain the etiological diagnosis of MDE. Nevertheless, it does not allow an early diagnosis and there are no objective measures of extracted clinical information. To remedy this, the use of digital tools and their correlation with clinical symptomatology could be useful.

OBJECTIVE: We aimed to review the current application of digital tools for MDE diagnosis while highlighting shortcomings for further research. In addition, our work was focused on digital devices easy to use during clinical interview and mental health issues where depression is common.

METHODS: We conducted a narrative review of the use of digital tools during clinical interviews for MDE by searching papers published in PubMed/MEDLINE, Web of Science, and Google Scholar databases since February 2010. The search was conducted from June to September 2021. Potentially relevant papers were then compared against a checklist for relevance and reviewed independently for inclusion, with focus on 4 allocated topics of (1) automated voice analysis, behavior analysis by (2) video and physiological measures, (3) heart rate variability (HRV), and (4) electrodermal activity (EDA). For this purpose, we were interested in 4 frequently found clinical conditions in which MDE can occur: (1) MDD, (2) BD, (3) PTSD, and (4) psychological trauma.

RESULTS: A total of 74 relevant papers on the subject were qualitatively analyzed and the information was synthesized. Thus, a digital phenotype of MDE seems to emerge consisting of modifications in speech features (namely, temporal, prosodic, spectral, source, and formants) and in speech content, modifications in nonverbal behavior (head, hand, body and eyes movement, facial expressivity, and gaze), and a decrease in physiological measurements (HRV and EDA). We not only found similarities but also differences when MDE occurs in MDD, BD, PTSD, or psychological trauma. However, comparative studies were rare in BD or PTSD conditions, which does not allow us to identify clear and distinct digital phenotypes.

CONCLUSIONS: Our search identified markers from several modalities that hold promise for helping with a more objective diagnosis of MDE. To validate their potential, further longitudinal and prospective studies are needed.

RevDate: 2023-01-21

Aoyama K, Hong L, Flege JE, et al (2023)

Relationships Between Acoustic Characteristics and Intelligibility Scores: A Reanalysis of Japanese Speakers' Productions of American English Liquids.

Language and speech [Epub ahead of print].

The primary purpose of this research report was to investigate the relationships between acoustic characteristics and perceived intelligibility for native Japanese speakers' productions of American English liquids. This report was based on a reanalysis of intelligibility scores and acoustic analyses that were reported in two previous studies. We examined which acoustic parameters were associated with higher perceived intelligibility scores for their productions of /l/ and /ɹ/ in American English, and whether Japanese speakers' productions of the two liquids were acoustically differentiated from each other. Results demonstrated that the second formant (F2) was strongly correlated with the perceived intelligibility scores for the Japanese adults' productions. Results also demonstrated that the Japanese adults' and children's productions of /l/ and /ɹ/ were indeed differentiated by some acoustic parameters including the third formant (F3). In addition, some changes occurred in the Japanese children's productions over the course of 1 year. Overall, the present report shows that Japanese speakers of American English may be making a distinction between /l/ and /ɹ/ in production, although the distinctions are made in a different way compared with native English speakers' productions. These findings have implications for setting realistic goals for improving intelligibility of English /l/ and /ɹ/ for Japanese speakers, as well as theoretical advancement of second-language speech learning.

RevDate: 2023-01-11
CmpDate: 2023-01-10

Sahin S, B Sen Yilmaz (2023)

Effects of the Orthognathic Surgery on the Voice Characteristics of Skeletal Class III Patients.

The Journal of craniofacial surgery, 34(1):253-257.

OBJECTIVES: To analyze the effects of the bimaxillary orthognathic surgery on the voice characteristics of skeletal Class III cases, and to evaluate correlations between acoustic and skeletal changes.

METHOD: Skeletal Class III adult patients (7 male, 18 female) were asked to pronounce the sounds "[a], [ɛ], [ɯ], [i], [ɔ], [œ], [u], [y]" for 3 seconds. Voice records and lateral cephalometric x-rays were taken before the surgery (T0) and 6 months after (T1). Voice records were taken for the control group with 6 months of interval (n=20). The formant frequencies (F0, F1, F2, and F3), Shimmer, Jitter and Noise to Harmonic Ratio (NHR) parameters were considered with Praat version 6.0.43.

RESULTS: In the surgery group, significant differences were observed in the F1 of [e], F2 and Shimmer of [ɯ] and F1 and F2 of [œ] and F1 of [y] sound, the post-surgery values were lower. F3 of [u] sound was higher. In comparison with the control group, ΔF3 of the [ɔ], ΔF3 of the [u] and ΔF1 of the [y] sound, ΔShimmer of [ɛ], [ɯ], [i], [ɔ], [u] and [y], and the ΔNHR of [ɔ] sound significantly changed. The Pearson correlation analysis proved some correlations; ΔF2 between ΔSNA for [ɯ] and [œ] sounds, ΔF1 between ΔHBV for [y] sound.

CONCLUSION: Bimaxillary orthognathic surgery changed some voice parameters in skeletal Class III patients. Some correlations were found between skeletal and acoustic parameters. We advise clinicians to consider these findings and inform their patients.

RevDate: 2023-01-11

Kim S, Choi J, T Cho (2023)

Data on English coda voicing contrast under different prosodic conditions produced by American English speakers and Korean learners of English.

Data in brief, 46:108816.

This data article provides acoustic data for individual speakers' production of coda voicing contrast between stops in English, which are based on laboratory speech recorded by twelve native speakers of American English and twenty-four Korean learners of English. There were four pairs of English monosyllabic target words with voicing contrast in the coda position (bet-bed, pet-ped, bat-bad, pat-pad). The words were produced in carrier sentences in which they were placed in two different prosodic boundary conditions (Intonational Phrase initial and Intonation Phrase medial), two pitch accent conditions (nuclear-pitch accented and unaccented), and three focus conditions (lexical focus, phonological focus and no focus). The raw acoustic measurement values that are included in a CSV-formated file are F0, F1, F2 and duration of each vowel preceding a coda consonant; and Voice Onset Time of word-initial stops. This article also provides figures that exemplify individual speaker variation of vowel duration, F0, F1 and F2 as a function of focus conditions. The data can thus be potentially reused to observe individual variations in phonetic encoding of coda voicing contrast as a function of the aforementioned prosodically-conditioned factors (i.e., prosodic boundary, pitch accent, focus) in native vs. non-native English. Some theoretical aspects of the data are discussed in the full-length article entitled "Phonetic encoding of coda voicing contrast under different focus conditions in L1 vs. L2 English" [1].

RevDate: 2023-01-11
CmpDate: 2023-01-03

Herbst CT, BH Story (2022)

Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing.

The Journal of the Acoustical Society of America, 152(6):3548.

A well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances (fR1, fR2) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1, fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.

RevDate: 2023-01-03

Ji Y, Hu Y, X Jiang (2022)

Segmental and suprasegmental encoding of speaker confidence in Wuxi dialect vowels.

Frontiers in psychology, 13:1028106.

INTRODUCTION: Wuxi dialect is a variation of Wu dialect spoken in eastern China and is characterized by a rich tonal system. Compared with standard Mandarin speakers, those of Wuxi dialect as their mother tongue can be more efficient in varying vocal cues to encode communicative meanings in speech communication. While literature has demonstrated that speakers encode high vs. low confidence in global prosodic cues at the sentence level, it is unknown how speakers' intended confidence is encoded at a more local, phonetic level. This study aimed to explore the effects of speakers' intended confidence on both prosodic and formant features of vowels in two lexical tones (the flat tone and the contour tone) of Wuxi dialect.

METHODS: Words of a single vowel were spoken in confident, unconfident, or neutral tone of voice by native Wuxi dialect speakers using a standard elicitation procedure. Linear-mixed effects modeling and parametric bootstrapping testing were performed.

RESULTS: The results showed that (1) the speakers raised both F1 and F2 in the confident level (compared with the neutral-intending expression). Additionally, F1 can distinguish between the confident and unconfident expressions; (2) Compared with the neutral-intending expression, the speakers raised mean f0, had a greater variation of f0 and prolonged pronunciation time in the unconfident level while they raised mean intensity, had a greater variation of intensity and prolonged pronunciation time in the confident level. (3) The speakers modulated mean f0 and mean intensity to a larger extent on the flat tone than the contour tone to differentiate between levels of confidence in the voice, while they modulated f0 and intensity range more only on the contour tone.

DISCUSSION: These findings shed new light on the mechanisms of segmental and suprasegmental encoding of speaker confidence and lack of confidence at the vowel level, highlighting the interplay of lexical tone and vocal expression in speech communication.

RevDate: 2023-08-04
CmpDate: 2022-12-27

Grawunder S, Uomini N, Samuni L, et al (2023)

Expression of concern: 'Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage' (2022) by Grawunder et al.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 378(1870):20220476.

RevDate: 2022-12-12

Moya-Galé G, Wisler AA, Walsh SJ, et al (2022)

Acoustic Predictors of Ease of Understanding in Spanish Speakers With Dysarthria Associated With Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The purpose of this study was to examine selected baseline acoustic features of hypokinetic dysarthria in Spanish speakers with Parkinson's disease (PD) and identify potential acoustic predictors of ease of understanding in Spanish.

METHOD: Seventeen Spanish-speaking individuals with mild-to-moderate hypokinetic dysarthria secondary to PD and eight healthy controls were recorded reading a translation of the Rainbow Passage. Acoustic measures of vowel space area, as indicated by the formant centralization ratio (FCR), envelope modulation spectra (EMS), and articulation rate were derived from the speech samples. Additionally, 15 healthy adults rated ease of understanding of the recordings on a visual analogue scale. A multiple linear regression model was implemented to investigate the predictive value of the selected acoustic parameters on ease of understanding.

RESULTS: Listeners' ease of understanding was significantly lower for speakers with dysarthria than for healthy controls. The FCR, EMS from the first 10 s of the reading passage, and the difference in EMS between the end and the beginning sections of the passage differed significantly between the two groups of speakers. Findings indicated that 67.7% of the variability in ease of understanding was explained by the predictive model, suggesting a moderately strong relationship between the acoustic and perceptual domains.

CONCLUSIONS: Measures of envelope modulation spectra were found to be highly significant model predictors of ease of understanding of Spanish-speaking individuals with hypokinetic dysarthria associated with PD. Articulation rate was also found to be important (albeit to a lesser degree) in the predictive model. The formant centralization ratio should be further examined with a larger sample size and more severe dysarthria to determine its efficacy in predicting ease of understanding.

RevDate: 2023-06-26
CmpDate: 2023-06-23

Peng H, Li S, Xing J, et al (2023)

Surface plasmon resonance of Au/Ag metals for the photoluminescence enhancement of lanthanide ion Ln[3+] doped upconversion nanoparticles in bioimaging.

Journal of materials chemistry. B, 11(24):5238-5250.

Deep tissue penetration, chemical inertness and biocompatibility give UCNPs a competitive edge over traditional fluorescent materials like organic dyes or quantum dots. However, the low quantum efficiency of UNCPs becomes an obstacle. Among extensive methods and strategies currently used to prominently solve this concerned issue, surface plasmon resonance (SPR) of noble metals is of great use due to the agreement between the SPR peak of metals and absorption band of UCNPs. A key challenge of this match is that the structures and sizes of noble metals have significant influences on the peak of SPR formants, where achieving an explicit elucidation of relationships between the physical properties of noble metals and their SPR formants is of great importance. This review aims to clarify the mechanism of the SPR effect of noble metals on the optical performance of UCNPs. Furthermore, novel research studies in which Au, Ag or Au/Ag composites in various structures and sizes are combined with UCNPs through different synthetic methods are summarized. We provide an overview of improved photoluminescence for bioimaging exhibited by different composite nanoparticles with respect to UCNPs acting as both cores and shells, taking Au@UCNPs, Ag@UCNPs and Au/Ag@UCNPs into account. Finally, there are remaining shortcomings and latent opportunities which deserve further research. This review will provide directions for the bioimaging applications of UCNPs through the introduction of the SPR effect of noble metals.

RevDate: 2022-12-02

Wang Y, Hattori M, Liu R, et al (2022)

Digital acoustic analysis of the first three formant frequencies in patients with a prosthesis after maxillectomy.

The Journal of prosthetic dentistry pii:S0022-3913(22)00654-0 [Epub ahead of print].

STATEMENT OF PROBLEM: Prosthetic rehabilitation with an obturator can help to restore or improve the intelligibility of speech in patients after maxillectomy. The frequency of formants 1 and 2 as well as their ranges were initially reported in patients with maxillary defects in 2002, and the evaluation method that was used is now applied in clinical evaluation. However, the details of formant 3 are not known and warrant investigation because, according to speech science, formant 3 is related to the pharyngeal volume. Clarifying the formant frequency values of formant 3 in patients after maxillectomy would enable prosthodontists to refer to these data when planning treatment and when assessing the outcome of an obturator.

PURPOSE: The purpose of this clinical study was to determine the acoustic characteristics of formant 3, together with those of formants 1 and 2, by using a digital acoustic analysis during maxillofacial prosthetic treatment. The utility of determining formant 3 in the evaluation of speech in patients after maxillectomy was also evaluated.

MATERIAL AND METHODS: Twenty-six male participants after a maxillectomy (mean age, 63 years; range, 20 to 93 years) were included, and the 5 Japanese vowels /a/, /e/, /i/, /o/, and /u/ produced with and without a definitive obturator prosthesis were recorded. The frequencies of the 3 formants were determined, and their ranges were calculated by using a speech analysis system (Computerized Speech Lab CSL 4400). The Wilcoxon signed rank test was used to compare the formants between the 2 use conditions (α=0.05).

RESULTS: Significant differences were found in the frequencies and ranges of all 3 formants between the use conditions. The ranges of all 3 formants produced with the prosthesis were significantly greater than those produced without it.

CONCLUSIONS: Based on the findings, both the first 2 formants and the third formant were changed by wearing an obturator prosthesis. Because formant 3 is related to the volume of the pharynx, evaluation of this formant and its range can reflect the effectiveness of the prosthesis to seal the oronasal communication and help reduce hypernasality, suggesting the utility of formant 3 analysis in prosthodontic rehabilitation.

RevDate: 2023-01-09
CmpDate: 2022-12-05

Voeten CC, Heeringa W, H Van de Velde (2022)

Normalization of nonlinearly time-dynamic vowels.

The Journal of the Acoustical Society of America, 152(5):2692.

This study compares 16 vowel-normalization methods for purposes of sociophonetic research. Most of the previous work in this domain has focused on the performance of normalization methods on steady-state vowels. By contrast, this study explicitly considers dynamic formant trajectories, using generalized additive models to model these nonlinearly. Normalization methods were compared using a hand-corrected dataset from the Flemish-Dutch Teacher Corpus, which contains 160 speakers from 8 geographical regions, who spoke regionally accented versions of Netherlandic/Flemish Standard Dutch. Normalization performance was assessed by comparing the methods' abilities to remove anatomical variation, retain vowel distinctions, and explain variation in the normalized F0-F3. In addition, it was established whether normalization competes with by-speaker random effects or supplements it, by comparing how much between-speaker variance remained to be apportioned to random effects after normalization. The results partly reproduce the good performance of Lobanov, Gerstman, and Nearey 1 found earlier and generally favor log-mean and centroid methods. However, newer methods achieve higher effect sizes (i.e., explain more variance) at only marginally worse performances. Random effects were found to be equally useful before and after normalization, showing that they complement it. The findings are interpreted in light of the way that the different methods handle formant dynamics.

RevDate: 2023-02-07
CmpDate: 2023-01-13

Leyns C, Daelman J, Adriaansen A, et al (2023)

Short-Term Acoustic Effects of Speech Therapy in Transgender Women: A Randomized Controlled Trial.

American journal of speech-language pathology, 32(1):145-168.

PURPOSE: This study measured and compared the acoustic short-term effects of pitch elevation training (PET) and articulation-resonance training (ART) and the combination of both programs, in transgender women.

METHOD: A randomized controlled study with cross-over design was used. Thirty transgender women were included and received 14 weeks of speech training. All participants started with 4 weeks of sham training; after which they were randomly assigned to one of two groups: One group continued with PET (5 weeks), followed by ART (5 weeks); the second group received both trainings in opposite order. Participants were recorded 4 times, in between the training blocks: pre, post 1 (after sham), post 2 (after training 1), and post 3 (after training 2). Speech samples included a sustained vowel, continuous speech during reading, and spontaneous speech and were analyzed using Praat software. Fundamental frequency (f o), intensity, voice range profile, vowel formant frequencies (F 1-2-3-4-5 of /a/-/i/-/u/), formant contrasts, vowel space, and vocal quality (Acoustic Voice Quality Index) were determined.

RESULTS AND CONCLUSIONS: Fundamental frequencies increased after both the PET and ART program, with a higher increase after PET. The combination of both interventions showed a mean increase of the f o of 49 Hz during a sustained vowel, 49 Hz during reading, and 29 Hz during spontaneous speech. However, the lower limit (percentile 5) of the f o during spontaneous speech did not change. Higher values were detected for F 1-2 of /a/, F 3 of /u/, and vowel space after PET and ART separately. F 1-2-3 of /a/, F 1-3-4 of /u/, vowel space, and formant contrasts increased after the combination of PET and ART; hence, the combination induced more increases in formant frequencies. Intensity and voice quality measurements did not change. No order effect was detected; that is, starting with PET or ART did not change the outcome.

RevDate: 2022-11-26

Chen S, Han C, Wang S, et al (2022)

Hearing the physical condition: The relationship between sexually dimorphic vocal traits and underlying physiology.

Frontiers in psychology, 13:983688.

A growing amount of research has shown associations between sexually dimorphic vocal traits and physiological conditions related to reproductive advantage. This paper presented a review of the literature on the relationship between sexually dimorphic vocal traits and sex hormones, body size, and physique. Those physiological conditions are important in reproductive success and mate selection. Regarding sex hormones, there are associations between sex-specific hormones and sexually dimorphic vocal traits; about body size, formant frequencies are more reliable predictors of human body size than pitch/fundamental frequency; with regard to the physique, there is a possible but still controversial association between human voice and strength and combat power, while pitch is more often used as a signal of aggressive intent in conflict. Future research should consider demographic, cross-cultural, cognitive interaction, and emotional motivation influences, in order to more accurately assess the relationship between voice and physiology. Moreover, neurological studies were recommended to gain a deeper understanding of the evolutionary origins and adaptive functions of voice modulation.

RevDate: 2022-12-09
CmpDate: 2022-11-21

Eichner ACO, Donadon C, Skarżyński PH, et al (2022)

A Systematic Review of the Literature Between 2009 and 2019 to Identify and Evaluate Publications on the Effects of Age-Related Hearing Loss on Speech Processing.

Medical science monitor : international medical journal of experimental and clinical research, 28:e938089.

Changes in central auditory processing due to aging in normal-hearing elderly patients, as well as age-related hearing loss, are often associated with difficulties in speech processing, especially in unfavorable acoustic environments. Speech processing depends on the perception of temporal and spectral features, and for this reason can be assessed by recordings of phase-locked neural activity when synchronized to transient and periodic sound stimuli frequency-following responses (FFRs). An electronic search of the PubMed and Web of Science databases was carried out in July 2019. Studies that evaluated the effects of age-related hearing loss on components of FFRs were included. Studies that were not in English, studies performed on animals, studies with cochlear implant users, literature reviews, letters to the editor, and case studies were excluded. Our search yielded 6 studies, each of which included 30 to 94 subjects aged between 18 and 80 years. Latency increases and significant amplitude reduction of the onset, offset, and sloop V/A components of FFRs were observed. Latency and amplitude impairment of the fundamental frequency, first formant, and high formants were related to peripheral sensorineural hearing loss in the elderly population. Conclusions: Temporal changes in FFR tracing were related to the aging process. Hearing loss also impacts the envelope fine structure, producing poorer speech comprehension in noisy environments. More research is needed to understand aspects related to hearing loss and cognitive aspects common to the elderly.

RevDate: 2022-11-14

Raveendran R, K Yeshoda (2022)

Effects of Resonant Voice Therapy on Perceptual and Acoustic Source and Tract Parameters - A Preliminary Study on Indian Carnatic Classical Singers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00299-5 [Epub ahead of print].

PURPOSE: The aim of the study was to examine the effects of resonant voice therapy (RVT) on the vocal resonance of trained Carnatic singers. The specific objectives were to evaluate the effects of resonant voice therapy on the auditory perceptual judgments and acoustic source and tract parameters before and after RVT on phonation and sung voice samples.

METHOD: Six vocally healthy trained Carnatic singers, three males and three females aged 18-25 years (M = 23; S.D = 2.09) participated in the study. All the participants were assigned to a 21-days-long Resonance Voice Therapy (RVT) training program. The participants' pre and post training phonation and sung samples were subjected to auditory perceptual analysis and acoustic analysis.

RESULTS: The results revealed that the post training auditory perceptual ratings of the phonation task showed a statistically significant difference from the pre training scores (Z= 2.35; P = 0.019). While for the singing task, the post training perceptual ratings were not significantly different from the pre training perceptual rating scores (Z= 2.66; P = 0.08). A significant difference was observed between the pre and post training values for all the measured acoustic parameters of the phonation task. In singing task, though the fundamental frequency, third and fourth formant frequencies showed no significant difference in the pre and post training conditions (P > 0.05), the parameter of- difference between the first formant frequency and the fundamental frequency showed a significant decrease (P = 0.028).

CONCLUSION: The effects of resonant voice production led to a high vocal economy, as evidenced from the improved source and filter acoustic parameters. Indication for formant tuning through vocal tract modifications, probably an enlarged pharyngeal area resulting in increased resonant voice quality in both phonation and singing tasks, is inferred from these results.

RevDate: 2023-01-09
CmpDate: 2022-11-15

Rocchesso D, Andolina S, Ilardo G, et al (2022)

A perceptual sound space for auditory displays based on sung-vowel synthesis.

Scientific reports, 12(1):19370.

When designing displays for the human senses, perceptual spaces are of great importance to give intuitive access to physical attributes. Similar to how perceptual spaces based on hue, saturation, and lightness were constructed for visual color, research has explored perceptual spaces for sounds of a given timbral family based on timbre, brightness, and pitch. To promote an embodied approach to the design of auditory displays, we introduce the Vowel-Type-Pitch (VTP) space, a cylindrical sound space based on human sung vowels, whose timbres can be synthesized by the composition of acoustic formants and can be categorically labeled. Vowels are arranged along the circular dimension, while voice type and pitch of the vowel correspond to the remaining two axes of the cylindrical VTP space. The decoupling and perceptual effectiveness of the three dimensions of the VTP space are tested through a vowel labeling experiment, whose results are visualized as maps on circular slices of the VTP cylinder. We discuss implications for the design of auditory and multi-sensory displays that account for human perceptual capabilities.

RevDate: 2022-11-26

Yoon TJ, S Ha (2022)

Adults' Perception of Children's Vowel Production.

Children (Basel, Switzerland), 9(11):.

The study examined the link between Korean-speaking children's vowel production and its perception by inexperienced adults and also observed whether ongoing vowel changes in mid-back vowels affect adults' perceptions when the vowels are produced by children. This study analyzed vowels in monosyllabic words produced by 20 children, ranging from 2 to 6 years old, with a focus on gender distinction, and used them as perceptual stimuli for word perception by 20 inexperienced adult listeners. Acoustic analyses indicated that F0 was not a reliable cue for distinguishing gender, but the first two formants served as reliable cues for gender distinction. The results confirmed that the spacing of the two low formants is linguistically and para-linguistically important in identifying vowel types and gender. However, a pair of non-low back vowels caused difficulties in correct vowel identification. Proximal distance between the vowels could be interpreted to result in the highest mismatch between children's production and adults' perception of the two non-low back vowels in the Korean language. We attribute the source of the highest mismatch of the two non-low back vowels to the ongoing sound change observed in high and mid-back vowels in adult speech. The ongoing vowel change is also observed in the children's vowel space, which may well be shaped after the caregivers whose non-low back vowels are close to each other.

RevDate: 2022-11-17

Guo S, Wu W, Liu Y, et al (2022)

Effects of Valley Topography on Acoustic Communication in Birds: Why Do Birds Avoid Deep Valleys in Daqinggou Nature Reserve?.

Animals : an open access journal from MDPI, 12(21):.

To investigate the effects of valley topography on the acoustic transmission of avian vocalisations, we carried out playback experiments in Daqinggou valley, Inner Mongolia, China. During the experiments, we recorded the vocalisations of five avian species, the large-billed crow (Corvus macrorhynchos Wagler, 1827), common cuckoo (Cuculus canorus Linnaeus, 1758), Eurasian magpie (Pica pica Linnaeus, 1758), Eurasian tree sparrow (Passer montanus Linnaeus, 1758), and meadow bunting (Emberiza cioides Brand, 1843), at transmission distances of 30 m and 50 m in the upper and lower parts of the valley and analysed the intensity, the fundamental frequency (F0), and the first three formant frequencies (F1/F2/F3) of the sounds. We also investigated bird species diversity in the upper and lower valley. We found that: (1) at the distance of 30 m, there were significant differences in F0/F1/F2/F3 in Eurasian magpies, significant differences in F1/F2/F3 in the meadow bunting and Eurasian tree sparrow, and partially significant differences in sound frequency between the upper and lower valley in the other two species; (2) at the distance of 50 m, there were significant differences in F0/F1/F2/F3 in two avian species (large-billed crow and common cuckoo) between the upper and lower valley and partially significant differences in sound frequency between the upper and lower valley in the other three species; (2) there were significant differences in the acoustic intensities of crow, cuckoo, magpie, and bunting calls between the upper and lower valley. (3) Species number and richness were significantly higher in the upper valley than in the lower valley. We suggested that the structure of valley habitats may lead to the breakdown of acoustic signals and communication in birds to varying degrees. The effect of valley topography on acoustic communication could be one reason for animal species avoiding deep valleys.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )