picture
RJR-logo

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

icon

Bibliography Options Menu

icon
QUERY RUN:
04 Mar 2024 at 01:50
HITS:
3027
PAGE OPTIONS:
Hide Abstracts   |   Hide Additional Links
NOTE:
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication

RJR-3x

Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 04 Mar 2024 at 01:50 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

-->

RevDate: 2024-02-28

Fletcher MD, Akis E, Verschuur CA, et al (2024)

Improved tactile speech perception using audio-to-tactile sensory substitution with formant frequency focusing.

Scientific reports, 14(1):4889.

Haptic hearing aids, which provide speech information through tactile stimulation, could substantially improve outcomes for both cochlear implant users and for those unable to access cochlear implants. Recent advances in wide-band haptic actuator technology have made new audio-to-tactile conversion strategies viable for wearable devices. One such strategy filters the audio into eight frequency bands, which are evenly distributed across the speech frequency range. The amplitude envelopes from the eight bands modulate the amplitudes of eight low-frequency tones, which are delivered through vibration to a single site on the wrist. This tactile vocoder strategy effectively transfers some phonemic information, but vowels and obstruent consonants are poorly portrayed. In 20 participants with normal touch perception, we tested (1) whether focusing the audio filters of the tactile vocoder more densely around the first and second formant frequencies improved tactile vowel discrimination, and (2) whether focusing filters at mid-to-high frequencies improved obstruent consonant discrimination. The obstruent-focused approach was found to be ineffective. However, the formant-focused approach improved vowel discrimination by 8%, without changing overall consonant discrimination. The formant-focused tactile vocoder strategy, which can readily be implemented in real time on a compact device, could substantially improve speech perception for haptic hearing aid users.

RevDate: 2024-02-21

Maya Lastra N, Rangel Negrín A, Coyohua Fuentes A, et al (2024)

Mantled howler monkey males assess their rivals through formant spacing of long-distance calls.

Primates; journal of primatology [Epub ahead of print].

Formant frequency spacing of long-distance vocalizations is allometrically related to body size and could represent an honest signal of fighting potential. There is, however, only limited evidence that primates use formant spacing to assess the competitive potential of rivals during interactions with extragroup males, a risky context. We hypothesized that if formant spacing of long-distance calls is inversely related to the fighting potential of male mantled howler monkeys (Alouatta palliata), then males should: (1) be more likely and (2) faster to display vocal responses to calling rivals; (3) be more likely and (4) faster to approach calling rivals; and have higher fecal (5) glucocorticoid and (6) testosterone metabolite concentrations in response to rivals calling at intermediate and high formant spacing than to those with low formant spacing. We studied the behavioral responses of 11 adult males to playback experiments of long-distance calls from unknown individuals with low (i.e., emulating large individuals), intermediate, and high (i.e., small individuals) formant spacing (n = 36 experiments). We assayed fecal glucocorticoid and testosterone metabolite concentrations (n = 174). Playbacks always elicited vocal responses, but males responded quicker to intermediate than to low formant spacing playbacks. Low formant spacing calls were less likely to elicit approaches whereas high formant spacing calls resulted in quicker approaches. Males showed stronger hormonal responses to low than to both intermediate and high formant spacing calls. It is possible that males do not escalate conflicts with rivals with low formant spacing calls if these are perceived as large, and against whom winning probabilities should decrease and confrontation costs increase; but are willing to escalate conflicts with rivals of high formant spacing. Formant spacing may therefore be an important signal for rival assessment in this species.

RevDate: 2024-02-16

Merritt B, Bent T, Kilgore R, et al (2024)

Auditory free classification of gender diverse speakersa).

The Journal of the Acoustical Society of America, 155(2):1422-1436.

Auditory attribution of speaker gender has historically been assumed to operate within a binary framework. The prevalence of gender diversity and its associated sociophonetic variability motivates an examination of how listeners perceptually represent these diverse voices. Utterances from 30 transgender (1 agender individual, 15 non-binary individuals, 7 transgender men, and 7 transgender women) and 30 cisgender (15 men and 15 women) speakers were used in an auditory free classification paradigm, in which cisgender listeners classified the speakers on perceived general similarity and gender identity. Multidimensional scaling of listeners' classifications revealed two-dimensional solutions as the best fit for general similarity classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values from more intermediate values. Listeners' classifications for gender identity collapsed into a one-dimensional space interpreted as masculinity/femininity. Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a gender dichotomy. Further, varying terminology used in instructions may bias listeners' gender judgements.

RevDate: 2024-02-15

Almurashi W, Al-Tamimi J, G Khattab (2024)

Dynamic specification of vowels in Hijazi Arabic.

Phonetica [Epub ahead of print].

Research on various languages shows that dynamic approaches to vowel acoustics - in particular Vowel-Inherent Spectral Change (VISC) - can play a vital role in characterising and classifying monophthongal vowels compared with a static model. This study's aim was to investigate whether dynamic cues also allow for better description and classification of the Hijazi Arabic (HA) vowel system, a phonological system based on both temporal and spectral distinctions. Along with static and dynamic F1 and F2 patterns, we evaluated the extent to which vowel duration, F0, and F3 contribute to increased/decreased discriminability among vowels. Data were collected from 20 native HA speakers (10 females and 10 males) producing eight HA monophthongal vowels in a word list with varied consonantal contexts. Results showed that dynamic cues provide further insights regarding HA vowels that are not normally gleaned from static measures alone. Using discriminant analysis, the dynamic cues (particularly the seven-point model) had relatively higher classification rates, and vowel duration was found to play a significant role as an additional cue. Our results are in line with dynamic approaches and highlight the importance of looking beyond static cues and beyond the first two formants for further insights into the description and classification of vowel systems.

RevDate: 2024-02-13

Simeone PJ, Green JR, Tager-Flusberg H, et al (2024)

Vowel distinctiveness as a concurrent predictor of expressive language function in autistic children.

Autism research : official journal of the International Society for Autism Research [Epub ahead of print].

Speech ability may limit spoken language development in some minimally verbal autistic children. In this study, we aimed to determine whether an acoustic measure of speech production, vowel distinctiveness, is concurrently related to expressive language (EL) for autistic children. Syllables containing the vowels [i] and [a] were recorded remotely from 27 autistic children (4;1-7;11) with a range of spoken language abilities. Vowel distinctiveness was calculated using automatic formant tracking software. Robust hierarchical regressions were conducted with receptive language (RL) and vowel distinctiveness as predictors of EL. Hierarchical regressions were also conducted within a High EL and a Low EL subgroup. Vowel distinctiveness accounted for 29% of the variance in EL for the entire group, RL for 38%. For the Low EL group, only vowel distinctiveness was significant, accounting for 38% of variance in EL. Conversely, in the High EL group, only RL was significant and accounted for 26% of variance in EL. Replicating previous results, speech production and RL significantly predicted concurrent EL in autistic children, with speech production being the sole significant predictor for the Low EL group and RL the sole significant predictor for the High EL group. Further work is needed to determine whether vowel distinctiveness longitudinally, as well as concurrently, predicts EL. Findings have important implications for the early identification of language impairment and in developing language interventions for autistic children.

RevDate: 2024-02-11

Shadle CH, Fulop SA, Chen WR, et al (2024)

Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models.

The Journal of the Acoustical Society of America, 155(2):1253-1263.

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.

RevDate: 2024-02-06

Saghiri MA, Vakhnovetsky J, Amanabi M, et al (2024)

Exploring the impact of type II diabetes mellitus on voice quality.

European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery [Epub ahead of print].

PURPOSE: This cross-sectional study aimed to investigate the potential of voice analysis as a prescreening tool for type II diabetes mellitus (T2DM) by examining the differences in voice recordings between non-diabetic and T2DM participants.

METHODS: 60 participants diagnosed as non-diabetic (n = 30) or T2DM (n = 30) were recruited on the basis of specific inclusion and exclusion criteria in Iran between February 2020 and September 2023. Participants were matched according to their year of birth and then placed into six age categories. Using the WhatsApp application, participants recorded the translated versions of speech elicitation tasks. Seven acoustic features [fundamental frequency, jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), voice onset time (VOT), and formant (F1-F2)] were extracted from each recording and analyzed using Praat software. Data was analyzed with Kolmogorov-Smirnov, two-way ANOVA, post hoc Tukey, binary logistic regression, and student t tests.

RESULTS: The comparison between groups showed significant differences in fundamental frequency, jitter, shimmer, CPP, and HNR (p < 0.05), while there were no significant differences in formant and VOT (p > 0.05). Binary logistic regression showed that shimmer was the most significant predictor of the disease group. There was also a significant difference between diabetes status and age, in the case of CPP.

CONCLUSIONS: Participants with type II diabetes exhibited significant vocal variations compared to non-diabetic controls.

RevDate: 2024-02-01

Benway NR, Preston JL, Salekin A, et al (2024)

Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.

JASA express letters, 4(2):.

The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.

RevDate: 2024-01-23

Hou Y, Li Q, Wang Z, et al (2024)

Study on a Pig Vocalization Classification Method Based on Multi-Feature Fusion.

Sensors (Basel, Switzerland), 24(2): pii:s24020313.

To improve the classification of pig vocalization using vocal signals and improve recognition accuracy, a pig vocalization classification method based on multi-feature fusion is proposed in this study. With the typical vocalization of pigs in large-scale breeding houses as the research object, short-time energy, frequency centroid, formant frequency and first-order difference, and Mel frequency cepstral coefficient and first-order difference were extracted as the fusion features. These fusion features were improved using principal component analysis. A pig vocalization classification model with a BP neural network optimized based on the genetic algorithm was constructed. The results showed that using the improved features to recognize pig grunting, squealing, and coughing, the average recognition accuracy was 93.2%; the recognition precisions were 87.9%, 98.1%, and 92.7%, respectively, with an average of 92.9%; and the recognition recalls were 92.0%, 99.1%, and 87.4%, respectively, with an average of 92.8%, which indicated that the proposed pig vocalization classification method had good recognition precision and recall, and could provide a reference for pig vocalization information feedback and automatic recognition.

RevDate: 2024-01-22

Nagamine T (2024)

Formant dynamics in second language speech: Japanese speakers' production of English liquids.

The Journal of the Acoustical Society of America, 155(1):479-495.

This article reports an acoustic study analysing the time-varying spectral properties of word-initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speakers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English /l/ and /ɹ/, the temporal characteristics of L2 English liquids are not well-understood, even in light of previous findings that English liquids show dynamic properties. In this study, the distance between the first and second formants (F2-F1) and the third formant (F3) are analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised additive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce word-initial English liquids with stronger vocalic coarticulation than L1 English speakers. L1 Japanese speakers may have difficulty in dissociating F2-F1 between the liquid and the vowel to a varying degree, depending on the vowel context, which could be related to perceptual factors. This article shows that dynamic information uncovers specific challenges that L1 Japanese speakers have in producing L2 English liquids accurately.

RevDate: 2024-01-16

Ghaemi H, Grillo R, Alizadeh O, et al (2023)

What Is the Effect of Maxillary Impaction Orthognathic Surgery on Voice Characteristics? A Quasi-Experimental Study.

World journal of plastic surgery, 12(3):44-56.

BACKGROUND: Regarding the impact of orthognathic surgery on the airway and voice, this study was carried out to investigate the effects of maxillary impaction surgery on patients' voices through acoustic analysis and articulation assessment.

METHODS: This quasi-experimental, before-and-after, double-blind study aimed at examining the effects of maxillary impaction surgery on the voice of orthognathic surgery patients. Before the surgery, a speech therapist conducted acoustic analysis, which included fundament frequency (F0), Jitter, Shimmer, and the harmonic-to-noise ratio (HNR), as well as first, second, and third formants (F1, F2, and F3). The patient's age, sex, degree of maxillary deformity, and impaction were documented in a checklist. Voice analysis was repeated during follow-up appointments at one and six months after the surgery in a blinded manner. The data were statistically analyzed using SPSS 23, and the significance level was set at 0.05.

RESULTS: Twenty two patients (18 females, 4 males) were examined, with ages ranging from 18 to 40 years and an average age of 25.54 years. F2, F3, HNR, and Shimmer demonstrated a significant increase over the investigation period compared to the initial phase of the study (P <0.001 for each). Conversely, the Jitter variable exhibited a significant decrease during the follow-up assessments in comparison to the initial phase of the study (P< 0.001).

CONCLUSION: Following maxillary impaction surgery, improvements in voice quality were observed compared to the preoperative condition. However, further studies with larger samples are needed to confirm the relevancy.

RevDate: 2024-01-12

Hedrick M, K Thornton (2024)

Reaction time for correct identification of vowels in consonant-vowel syllables and of vowel segments.

JASA express letters, 4(1):.

Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.

RevDate: 2024-01-04

Sathe NC, Kain A, LAJ Reiss (2024)

Fusion of dichotic consonants in normal-hearing and hearing-impaired listenersa).

The Journal of the Acoustical Society of America, 155(1):68-77.

Hearing-impaired (HI) listeners have been shown to exhibit increased fusion of dichotic vowels, even with different fundamental frequency (F0), leading to binaural spectral averaging and interference. To determine if similar fusion and averaging occurs for consonants, four natural and synthesized stop consonants (/pa/, /ba/, /ka/, /ga/) at three F0s of 74, 106, and 185 Hz were presented dichotically-with ΔF0 varied-to normal-hearing (NH) and HI listeners. Listeners identified the one or two consonants perceived, and response options included /ta/ and /da/ as fused percepts. As ΔF0 increased, both groups showed decreases in fusion and increases in percent correct identification of both consonants, with HI listeners displaying similar fusion but poorer identification. Both groups exhibited spectral averaging (psychoacoustic fusion) of place of articulation but phonetic feature fusion for differences in voicing. With synthetic consonants, NH subjects showed increased fusion and decreased identification. Most HI listeners were unable to discriminate the synthetic consonants. The findings suggest smaller differences between groups in consonant fusion than vowel fusion, possibly due to the presence of more cues for segregation in natural speech or reduced reliance on spectral cues for consonant perception. The inability of HI listeners to discriminate synthetic consonants suggests a reliance on cues other than formant transitions for consonant discrimination.

RevDate: 2024-01-02

Wang L, Liu R, Wang Y, et al (2024)

Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial.

Applied psychophysiology and biofeedback [Epub ahead of print].

Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.

RevDate: 2023-12-29

Anikin A, Barreda S, D Reby (2023)

A practical guide to calculating vocal tract length and scale-invariant formant patterns.

Behavior research methods [Epub ahead of print].

Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.

RevDate: 2023-12-23

Kraxberger F, Näger C, Laudato M, et al (2023)

On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations.

Bioengineering (Basel, Switzerland), 10(12): pii:bioengineering10121369.

Sound generation in human phonation and the underlying fluid-structure-acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds' coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds' vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates.

RevDate: 2023-12-12

Pah ND, Motin MA, Oliveira GC, et al (2023)

The Change of Vocal Tract Length in People with Parkinson's Disease.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference, 2023:1-4.

Hypokinetic dysarthria is one of the early symptoms of Parkinson's disease (PD) and has been proposed for early detection and also for monitoring of the progression of the disease. PD reduces the control of vocal tract muscles such as the tongue and lips and, therefore the length of the active vocal tract is altered. However, the change in the vocal tract length due to the disease has not been investigated. The aim of this study was to determine the difference in the apparent vocal tract length (AVTL) between people with PD and age-matched control healthy people. The phoneme, /a/ from the UCI Parkinson's Disease Classification Dataset and the Italian Parkinson's Voice and Speech Dataset were used and AVTL was calculated based on the first four formants of the sustained phoneme (F1-F4). The results show a correlation between Parkinson's disease and an increase in vocal tract length. The most sensitive feature was the AVTL calculated using the first formants of sustained phonemes (F1). The other significant finding reported in this article is that the difference is significant and only appeared in the male participants. However, the size of the database is not sufficiently large to identify the possible confounding factors such as the severity and duration of the disease, medication, age, and comorbidity factors.Clinical relevance-The outcomes of this research have the potential to improve the identification of early Parkinsonian dysarthria and monitor PD progression.

RevDate: 2023-12-07

Orekhova EV, Fadeev KA, Goiaeva DE, et al (2023)

Different hemispheric lateralization for periodicity and formant structure of vowels in the auditory cortex and its changes between childhood and adulthood.

Cortex; a journal devoted to the study of the nervous system and behavior, 171:287-307 pii:S0010-9452(23)00281-2 [Epub ahead of print].

The spectral formant structure and periodicity pitch are the major features that determine the identity of vowels and the characteristics of the speaker. However, very little is known about how the processing of these features in the auditory cortex changes during development. To address this question, we independently manipulated the periodicity and formant structure of vowels while measuring auditory cortex responses using magnetoencephalography (MEG) in children aged 7-12 years and adults. We analyzed the sustained negative shift of source current associated with these vowel properties, which was present in the auditory cortex in both age groups despite differences in the transient components of the auditory response. In adults, the sustained activation associated with formant structure was lateralized to the left hemisphere early in the auditory processing stream requiring neither attention nor semantic mapping. This lateralization was not yet established in children, in whom the right hemisphere contribution to formant processing was strong and decreased during or after puberty. In contrast to the formant structure, periodicity was associated with a greater response in the right hemisphere in both children and adults. These findings suggest that left-lateralization for the automatic processing of vowel formant structure emerges relatively late in ontogenesis and pose a serious challenge to current theories of hemispheric specialization for speech processing.

RevDate: 2023-12-07

Alain C, Göke K, Shen D, et al (2023)

Neural alpha oscillations index context-driven perception of ambiguous vowel sequences.

iScience, 26(12):108457.

Perception of bistable stimuli is influenced by prior context. In some cases, the interpretation matches with how the preceding stimulus was perceived; in others, it tends to be the opposite of the previous stimulus percept. We measured high-density electroencephalography (EEG) while participants were presented with a sequence of vowels that varied in formant transition, promoting the perception of one or two auditory streams followed by an ambiguous bistable sequence. For the bistable sequence, participants were more likely to report hearing the opposite percept of the one heard immediately before. This auditory contrast effect coincided with changes in alpha power localized in the left angular gyrus and left sensorimotor and right sensorimotor/supramarginal areas. The latter correlated with participants' perception. These results suggest that the contrast effect for a bistable sequence of vowels may be related to neural adaptation in posterior auditory areas, which influences participants' perceptual construal level of ambiguous stimuli.

RevDate: 2023-12-05

Shellikeri S, Cho S, Ash S, et al (2023)

Digital markers of motor speech impairments in spontaneous speech of patients with ALS-FTD spectrum disorders.

Amyotrophic lateral sclerosis & frontotemporal degeneration [Epub ahead of print].

OBJECTIVE: To evaluate automated digital speech measures, derived from spontaneous speech (picture descriptions), in assessing bulbar motor impairments in patients with ALS-FTD spectrum disorders (ALS-FTSD).

METHODS: Automated vowel algorithms were employed to extract two vowel acoustic measures: vowel space area (VSA), and mean second formant slope (F2 slope). Vowel measures were compared between ALS with and without clinical bulbar symptoms (ALS + bulbar (n = 49, ALSFRS-r bulbar subscore: x¯ = 9.8 (SD = 1.7)) vs. ALS-nonbulbar (n = 23), behavioral variant frontotemporal dementia (bvFTD, n = 25) without a motor syndrome, and healthy controls (HC, n = 32). Correlations with bulbar motor clinical scales, perceived listener effort, and MRI cortical thickness of the orobuccal primary motor cortex (oral PMC) were examined. We compared vowel measures to speaking rate, a conventional metric for assessing bulbar dysfunction.

RESULTS: ALS + bulbar had significantly reduced VSA and F2 slope than ALS-nonbulbar (|d|=0.94 and |d|=1.04, respectively), bvFTD (|d|=0.89 and |d|=1.47), and HC (|d|=0.73 and |d|=0.99). These reductions correlated with worse bulbar clinical scores (VSA: R = 0.33, p = 0.043; F2 slope: R = 0.38, p = 0.011), greater listener effort (VSA: R=-0.43, p = 0.041; F2 slope: p > 0.05), and cortical thinning in oral PMC (F2 slope: β = 0.0026, p = 0.017). Vowel measures demonstrated greater sensitivity and specificity for bulbar impairment than speaking rate, while showing independence from cognitive and respiratory impairments.

CONCLUSION: Automatic vowel measures are easily derived from a brief spontaneous speech sample, are sensitive to mild-moderate stage of bulbar disease in ALS-FTSD, and may present better sensitivity to bulbar impairment compared to traditional assessments such as speaking rate.

RevDate: 2023-11-30

Heeringa AN, Jüchter C, Beutelmann R, et al (2023)

Altered neural encoding of vowels in noise does not affect behavioral vowel discrimination in gerbils with age-related hearing loss.

Frontiers in neuroscience, 17:1238941.

INTRODUCTION: Understanding speech in a noisy environment, as opposed to speech in quiet, becomes increasingly more difficult with increasing age. Using the quiet-aged gerbil, we studied the effects of aging on speech-in-noise processing. Specifically, behavioral vowel discrimination and the encoding of these vowels by single auditory-nerve fibers were compared, to elucidate some of the underlying mechanisms of age-related speech-in-noise perception deficits.

METHODS: Young-adult and quiet-aged Mongolian gerbils, of either sex, were trained to discriminate a deviant naturally-spoken vowel in a sequence of vowel standards against a speech-like background noise. In addition, we recorded responses from single auditory-nerve fibers of young-adult and quiet-aged gerbils while presenting the same speech stimuli.

RESULTS: Behavioral vowel discrimination was not significantly affected by aging. For both young-adult and quiet-aged gerbils, the behavioral discrimination between /eː/ and /iː/ was more difficult to make than /eː/ vs. /aː/ or /iː/ vs. /aː/, as evidenced by longer response times and lower d' values. In young-adults, spike timing-based vowel discrimination agreed with the behavioral vowel discrimination, while in quiet-aged gerbils it did not. Paradoxically, discrimination between vowels based on temporal responses was enhanced in aged gerbils for all vowel comparisons. Representation schemes, based on the spectrum of the inter-spike interval histogram, revealed stronger encoding of both the fundamental and the lower formant frequencies in fibers of quiet-aged gerbils, but no qualitative changes in vowel encoding. Elevated thresholds in combination with a fixed stimulus level, i.e., lower sensation levels of the stimuli for old individuals, can explain the enhanced temporal coding of the vowels in noise.

DISCUSSION: These results suggest that the altered auditory-nerve discrimination metrics in old gerbils may mask age-related deterioration in the central (auditory) system to the extent that behavioral vowel discrimination matches that of the young adults.

RevDate: 2023-11-29

Mohn JL, Baese-Berk MM, S Jaramillo (2023)

Selectivity to acoustic features of human speech in the auditory cortex of the mouse.

Hearing research, 441:108920 pii:S0378-5955(23)00232-0 [Epub ahead of print].

A better understanding of the neural mechanisms of speech processing can have a major impact in the development of strategies for language learning and in addressing disorders that affect speech comprehension. Technical limitations in research with human subjects hinder a comprehensive exploration of these processes, making animal models essential for advancing the characterization of how neural circuits make speech perception possible. Here, we investigated the mouse as a model organism for studying speech processing and explored whether distinct regions of the mouse auditory cortex are sensitive to specific acoustic features of speech. We found that mice can learn to categorize frequency-shifted human speech sounds based on differences in formant transitions (FT) and voice onset time (VOT). Moreover, neurons across various auditory cortical regions were selective to these speech features, with a higher proportion of speech-selective neurons in the dorso-posterior region. Last, many of these neurons displayed mixed-selectivity for both features, an attribute that was most common in dorsal regions of the auditory cortex. Our results demonstrate that the mouse serves as a valuable model for studying the detailed mechanisms of speech feature encoding and neural plasticity during speech-sound learning.

RevDate: 2023-11-27

Anikin A, Valente D, Pisanski K, et al (2023)

The role of loudness in vocal intimidation.

Journal of experimental psychology. General pii:2024-28586-001 [Epub ahead of print].

Across many species, a major function of vocal communication is to convey formidability, with low voice frequencies traditionally considered the main vehicle for projecting large size and aggression. Vocal loudness is often ignored, yet it might explain some puzzling exceptions to this frequency code. Here we demonstrate, through acoustic analyses of over 3,000 human vocalizations and four perceptual experiments, that vocalizers produce low frequencies when attempting to sound large, but loudness is prioritized for displays of strength and aggression. Our results show that, although being loud is effective for signaling strength and aggression, it poses a physiological trade-off with low frequencies because a loud voice is achieved by elevating pitch and opening the mouth wide into a-like vowels. This may explain why aggressive vocalizations are often high-pitched and why open vowels are considered "large" in sound symbolism despite their high first formant. Callers often compensate by adding vocal harshness (nonlinear vocal phenomena) to undesirably high-pitched loud vocalizations, but a combination of low and loud remains an honest predictor of both perceived and actual physical formidability. The proposed notion of a loudness-frequency trade-off thus adds a new dimension to the widely accepted frequency code and requires a fundamental rethinking of the evolutionary forces shaping the form of acoustic signals. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

RevDate: 2023-11-24

Barrientos E, E Cataldo (2023)

Estimating Formant Frequencies of Vowels Sung by Sopranos Using Weighted Linear Prediction.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00322-3 [Epub ahead of print].

This study introduces the weighted linear prediction adapted to high-pitched singing voices (WLP-HPSV) method for accurately estimating formant frequencies of vowels sung by lyric sopranos. The WLP-HPSV method employs a variant of the WLP analysis combined with the zero-frequency filtering (ZFF) technique to address specific challenges in formant estimation from singing signals. Evaluation of the WLP-HPSV method compared to the LPC method demonstrated its superior performance in accurately capturing the spectral characteristics of synthetic /u/ vowels and the /a/ and /u/ natural singing vowels. The QCP parameters used in the WLP-HPSV method varied with pitch, revealing insights into the interplay between the vocal tract and glottal characteristics during vowel production. The comparison between the LPC and WLP-HPSV methods highlighted the robustness of the WLP-HPSV method in accurately estimating formant frequencies across different pitches.

RevDate: 2023-11-22

Punamäki RL, Diab SY, Drosos K, et al (2023)

The role of acoustic features of maternal infant-directed singing in enhancing infant sensorimotor, language and socioemotional development.

Infant behavior & development, 74:101908 pii:S0163-6383(23)00100-5 [Epub ahead of print].

The quality of infant-directed speech (IDS) and infant-directed singing (IDSi) are considered vital to children, but empirical studies on protomusical qualities of the IDSi influencing infant development are rare. The current prospective study examines the role of IDSi acoustic features, such as pitch variability, shape and movement, and vocal amplitude vibration, timbre, and resonance, in associating with infant sensorimotor, language, and socioemotional development at six and 18 months. The sample consists of 236 Palestinian mothers from Gaza Strip singing to their six-month-olds a song by their own choice. Maternal IDSi was recorded and analyzed by the OpenSMILE- tool to depict main acoustic features of pitch frequencies, variations, and contours, vocal intensity, resonance formants, and power. The results are based on completed 219 maternal IDSi. Mothers reported about their infants' sensorimotor, language-vocalization, and socioemotional skills at six months, and psychologists tested these skills by Bayley Scales for Infant Development at 18 months. Results show that maternal IDSi characterized by wide pitch variability and rich and high vocal amplitude and vibration were associated with infants' optimal sensorimotor, language vocalization, and socioemotional skills at six months, and rich and high vocal amplitude and vibration predicted these optimal developmental skills also at 18 months. High resonance and rhythmicity formants were associated with optimal language and vocalization skills at six months. To conclude, the IDSi is considered important in enhancing newborn and risk infants' wellbeing, and the current findings argue that favorable acoustic singing qualities are crucial for optimal multidomain development across infancy.

RevDate: 2023-11-22

Levin M, Y Zaltz (2023)

Voice Discrimination in Quiet and in Background Noise by Simulated and Real Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Cochlear implant (CI) users demonstrate poor voice discrimination (VD) in quiet conditions based on the speaker's fundamental frequency (fo) and formant frequencies (i.e., vocal-tract length [VTL]). Our purpose was to examine the effect of background noise at levels that allow good speech recognition thresholds (SRTs) on VD via acoustic CI simulations and CI hearing.

METHOD: Forty-eight normal-hearing (NH) listeners who listened via noise-excited (n = 20) or sinewave (n = 28) vocoders and 10 prelingually deaf CI users (i.e., whose hearing loss began before language acquisition) participated in the study. First, the signal-to-noise ratio (SNR) that yields 70.7% correct SRT was assessed using an adaptive sentence-in-noise test. Next, the CI simulation listeners performed 12 adaptive VDs: six in quiet conditions, two with each cue (fo, VTL, fo + VTL), and six amid speech-shaped noise. The CI participants performed six VDs: one with each cue, in quiet and amid noise. SNR at VD testing was 5 dB higher than the individual's SRT in noise (SRTn +5 dB).

RESULTS: Results showed the following: (a) Better VD was achieved via the noise-excited than the sinewave vocoder, with the noise-excited vocoder better mimicking CI VD; (b) background noise had a limited negative effect on VD, only for the CI simulation listeners; and (c) there was a significant association between SNR at testing and VTL VD only for the CI simulation listeners.

CONCLUSIONS: For NH listeners who listen to CI simulations, noise that allows good SRT can nevertheless impede VD, probably because VD depends more on bottom-up sensory processing. Conversely, for prelingually deaf CI users, noise that allows good SRT hardly affects VD, suggesting that they rely strongly on bottom-up processing for both VD and speech recognition.

RevDate: 2023-11-22

Kapsner-Smith MR, Abur D, Eadie TL, et al (2023)

Test-Retest Reliability of Behavioral Assays of Feedforward and Feedback Auditory-Motor Control of Voice and Articulation.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Behavioral assays of feedforward and feedback auditory-motor control of voice and articulation frequently are used to make inferences about underlying neural mechanisms and to study speech development and disorders. However, no studies have examined the test-retest reliability of such measures, which is critical for rigorous study of auditory-motor control. Thus, the purpose of the present study was to assess the reliability of assays of feedforward and feedback control in voice versus articulation domains.

METHOD: Twenty-eight participants (14 cisgender women, 12 cisgender men, one transgender man, one transmasculine/nonbinary) who denied any history of speech, hearing, or neurological impairment were measured for responses to predictable versus unexpected auditory feedback perturbations of vocal (fundamental frequency, fo) and articulatory (first formant, F1) acoustic parameters twice, with 3-6 weeks between sessions. Reliability was measured with intraclass correlations.

RESULTS: Opposite patterns of reliability were observed for fo and F1; fo reflexive responses showed good reliability and fo adaptive responses showed poor reliability, whereas F1 reflexive responses showed poor reliability and F1 adaptive responses showed moderate reliability. However, a criterion-referenced categorical measurement of fo adaptive responses as typical versus atypical showed substantial test-retest agreement.

CONCLUSIONS: Individual responses to some behavioral assays of auditory-motor control of speech should be interpreted with caution, which has implications for several fields of research. Additional research is needed to establish reliable criterion-referenced measures of F1 adaptive responses as well as fo and F1 reflexive responses. Furthermore, the opposite patterns of test-retest reliability observed for voice versus articulation add to growing evidence for differences in underlying neural control mechanisms.

RevDate: 2023-11-21

Zhang W, M Clayards (2023)

Contribution of acoustic cues to prominence ratings for four Mandarin vowels.

The Journal of the Acoustical Society of America, 154(5):3364-3373.

The acoustic cues for prosodic prominence have been explored extensively, but one open question is to what extent they differ by context. This study investigates the extent to which vowel type affects how acoustic cues are related to prominence ratings provided in a corpus of spoken Mandarin. In the corpus, each syllable was rated as either prominent or non-prominent. We predicted prominence ratings using Bayesian mixed-effect regression models for each of four Mandarin vowels (/a, i, ɤ, u/), using fundamental frequency (F0), intensity, duration, the first and second formants, and tone type as predictors. We compared the role of each cue within and across the four models. We found that overall duration was the best predictor of prominence ratings and that formants were the weakest, but the role of each cue differed by vowel. We did not find credible evidence that F0 was relevant for /a/, or that intensity was relevant for /i/. We also found evidence that duration was more important for /ɤ/ than for /i/. The results suggest that vowel type credibly affects prominence ratings, which may reflect differences in the coordination of acoustic cues in prominence marking.

RevDate: 2023-11-17

Jasim M, Nayana VG, Nayaka H, et al (2023)

Effect of Adenotonsillectomy on Spectral and Acoustic Characteristics.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 75(4):3467-3475.

Acoustic analysis and perceptual analysis has been extensively used to assess the speech and voice among individual with voice disorders. These methods provide objective, quantitative and precise information on the speech and voice characteristics in any given disorder and help in monitoring any recovery, deterioration, or improvement in an individual's speech and also differentiate between normal and abnormal speech and voice characteristics. The present study was carried out to investigate the spectral characteristics (formant frequency parameters and formant centralization ratios) and voice characteristics (Acoustic parameters of voice) changes in individuals following adenotonsillectomy. A total of 34 participants participated in the study with a history of adenotonsillar hypertrophy. Spectral and acoustic voice parameters were analyzed across the three-time domains, before surgery (T0), 30 days (T1), and 90 days (T2) after surgery. Data was analyzed statistically using the SPSS software version-28.0.0.0. Descriptive statistics were used to find the mean and standard deviation. Repeated measures of ANOVA were used to compare the pre and post-experimental measures for spectral and acoustic, voice parameters. The derived parameter of acoustic vowel space (formant centralization ratio 3) was compared across three conditions timelines. The results revealed that acoustic vowel space measure and formant frequency measures were significantly increased in pre and post-operative conditions across the three timelines. A significant difference was obtained across the acoustic parameters across the time domains. Adenotonsillectomy has been proved to be an efficient surgical procedure in treating children with chronic adenotonsillitis. The results obtained have indicated an overall improvement in the spectral and acoustic voice parameters thereby highlighting the need for adenotonsillectomy at the right time and at the right age.

RevDate: 2023-11-16

Noffs G, Cobler-Lichter M, Perera T, et al (2023)

Plug-and-play microphones for recording speech and voice with smart devices.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000535152 [Epub ahead of print].

INTRODUCTION Smart devices are widely available and capable of quickly recording and uploading speech segments for health-related analysis. The switch from laboratory recordings with professional-grade microphone set ups to remote, smart device-based recordings offers immense potential for the scalability of voice assessment. Yet, a growing body of literature points to a wide heterogeneity among acoustic metrics for their robustness to variation in recording devices. The addition of consumer-grade plug-and-play microphones has been proposed as a possible solution. Our aim was to assess if the addition of consumer-grade plug-and-play microphones increase the acoustic measurement agreement between ultra-portable devices and a reference microphone. METHODS Speech was simultaneously recorded by a reference high-quality microphone commonly used in research, and by two configurations with plug-and-play microphones. Twelve speech-acoustic features were calculated using recordings from each microphone to determine the agreement intervals in measurements between microphones. Agreement intervals were then compared to expected deviations in speech in various neurological conditions. Each microphone's response to speech and to silence were characterized through acoustic analysis to explore possible reasons for differences in acoustic measurements between microphones. The statistical differentiation of two groups, neurotypical and people with Multiple Sclerosis, using metrics from each tested microphone was compared to that of the reference microphone. RESULTS The two consumer-grade plug-and-play microphones favoured high frequencies (mean centre of gravity difference ≥ +175.3Hz) and recorded more noise (mean difference in signal-to-noise ≤ -4.2dB) when compared to the reference microphone. Between consumer-grade microphones, differences in relative noise were closely related to distance between the microphone and the speaker's mouth. Agreement intervals between the reference and consumer-grade microphones remained under disease-expected deviations only for fundamental frequency (f0, agreement interval ≤0.06Hz), f0 instability (f0 CoV, agreement interval ≤0.05%) and for tracking of second formant movement (agreement interval ≤1.4Hz/millisecond). Agreement between microphones was poor for other metrics, particularly for fine timing metrics (mean pause length and pause length variability for various tasks). The statistical difference between the two groups of speakers was smaller with the plug-and-play than with the reference microphone. CONCLUSION Measurement of f0 and F2 slope were robust to variation in recording equipment while other acoustic metrics were not. Thus, the tested plug-and-play microphones should not be used interchangeably with professional-grade microphones for speech analysis. Plug-and-play microphones may assist in equipment standardization within speech studies, including remote or self-recording, possibly with small loss in accuracy and statistical power as observed in this study.

RevDate: 2023-11-09

Ribas-Prats T, Cordero G, Lip-Sosa DL, et al (2023)

Developmental Trajectory of the Frequency-Following Response During the First 6 Months of Life.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The aim of the present study is to characterize the maturational changes during the first 6 months of life in the neural encoding of two speech sound features relevant for early language acquisition: the stimulus fundamental frequency (fo), related to stimulus pitch, and the vowel formant composition, particularly F1. The frequency-following response (FFR) was used as a snapshot into the neural encoding of these two stimulus attributes.

METHOD: FFRs to a consonant-vowel stimulus /da/ were retrieved from electroencephalographic recordings in a sample of 80 healthy infants (45 at birth and 35 at the age of 1 month). Thirty-two infants (16 recorded at birth and 16 recorded at 1 month) returned for a second recording at 6 months of age.

RESULTS: Stimulus fo and F1 encoding showed improvements from birth to 6 months of age. Most remarkably, a significant improvement in the F1 neural encoding was observed during the first month of life.

CONCLUSION: Our results highlight the rapid and sustained maturation of the basic neural machinery necessary for the phoneme discrimination ability during the first 6 months of age.

RevDate: 2023-11-09

Mračková M, Mareček R, Mekyska J, et al (2023)

Levodopa may modulate specific speech impairment in Parkinson's disease: an fMRI study.

Journal of neural transmission (Vienna, Austria : 1996) [Epub ahead of print].

Hypokinetic dysarthria (HD) is a difficult-to-treat symptom affecting quality of life in patients with Parkinson's disease (PD). Levodopa may partially alleviate some symptoms of HD in PD, but the neural correlates of these effects are not fully understood. The aim of our study was to identify neural mechanisms by which levodopa affects articulation and prosody in patients with PD. Altogether 20 PD patients participated in a task fMRI study (overt sentence reading). Using a single dose of levodopa after an overnight withdrawal of dopaminergic medication, levodopa-induced BOLD signal changes within the articulatory pathway (in regions of interest; ROIs) were studied. We also correlated levodopa-induced BOLD signal changes with the changes in acoustic parameters of speech. We observed no significant changes in acoustic parameters due to acute levodopa administration. After levodopa administration as compared to the OFF dopaminergic condition, patients showed task-induced BOLD signal decreases in the left ventral thalamus (p = 0.0033). The changes in thalamic activation were associated with changes in pitch variation (R = 0.67, p = 0.006), while the changes in caudate nucleus activation were related to changes in the second formant variability which evaluates precise articulation (R = 0.70, p = 0.003). The results are in line with the notion that levodopa does not have a major impact on HD in PD, but it may induce neural changes within the basal ganglia circuitries that are related to changes in speech prosody and articulation.

RevDate: 2023-11-08

Liu W, Wang Y, C Liang (2023)

Formant and Voice Source Characteristics of Vowels in Chinese National Singing and Bel Canto. A Pilot Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00323-5 [Epub ahead of print].

BACKGROUND: There have been numerous reports on the acoustic characteristics of singers' vowel articulation and phonation, and these studies cover many phonetic dimensions, such as fundamental frequency (F0), intensity, formant frequency, and voice quality.

METHOD: Taking the three representative vowels (/a/, /i/, /u/) in Chinese National Singing and Bel Canto as the research object, the present study investigates the differences and associations in vowel articulation and phonation between Chinese National Singing and Bel Canto using acoustic measures, for example, F0, formant frequency, long-term average spectrum (LTAS).

RESULTS: The relationship between F0 and formant indicates that F1 is proportional to F0, in which the female has a significant variation in vowel /a/. Compared with the male, the formant structure of the female singing voice differs significantly from that of the speech voice. Regarding the relationship between intensity and formant, LTAS shows that the Chinese National Singing tenor and Bel Canto baritone have the singer's formant cluster when singing vowels, while the two sopranos do not.

CONCLUSIONS: The systematic changes of formant frequencies with voice source are observed. (i) F1 of the female vowel /a/ has undergone a significant tuning change in the register transition, reflecting the characteristics of singing genres. (ii) Female singers utilize the intrinsic pitch of vowels when adopting the register transition strategy. This finding can be assumed to facilitate understanding the theory of intrinsic vowel pitch and revise Sundberg's hypothesis that F1 rises with F0. A non-linear relationship exists between F1 and F0, which adds to the non-linear interaction of the formant and vocal source. (iii) The singer's formant is affected by voice classification, gender, and singing genres.

RevDate: 2023-11-07

Keller PE, Lee J, König R, et al (2023)

Sex-related communicative functions of voice spectral energy in human chorusing.

Biology letters, 19(11):20230326.

Music is a human communicative art whose evolutionary origins may lie in capacities that support cooperation and/or competition. A mixed account favouring simultaneous cooperation and competition draws on analogous interactive displays produced by collectively signalling non-human animals (e.g. crickets and frogs). In these displays, rhythmically coordinated calls serve as a beacon whereby groups of males 'cooperatively' attract potential female mates, while the likelihood of each male competitively attracting an actual mate depends on the precedence of his signal. Human behaviour consistent with the mixed account was previously observed in a renowned boys choir, where the basses-the oldest boys with the deepest voices-boosted their acoustic prominence by increasing energy in a high-frequency band of the vocal spectrum when girls were in an otherwise male audience. The current study tested female and male sensitivity and preferences for this subtle vocal modulation in online listening tasks. Results indicate that while female and male listeners are similarly sensitive to enhanced high-spectral energy elicited by the presence of girls in the audience, only female listeners exhibit a reliable preference for it. Findings suggest that human chorusing is a flexible form of social communicative behaviour that allows simultaneous group cohesion and sexually motivated competition.

RevDate: 2023-11-04

Baker CP, Brockmann-Bauser M, Purdy SC, et al (2023)

High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00316-8 [Epub ahead of print].

OBJECTIVES: This in silico study explored the effects of a wide range of fundamental frequency (fo), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures.

METHOD: Using 53 synthesized tones produced in Madde, the effects of stepwise increases in fo, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any fo effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation.

RESULTS: Increasing fo resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored fo range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde fo condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0.05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05).

CONCLUSION: These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying fo, SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and postintervention data using these measures.

RevDate: 2023-10-31

Song J, Kim M, J Park (2023)

Acoustic correlates of perceived personality from Korean utterances in a formal communicative setting.

PloS one, 18(10):e0293222 pii:PONE-D-23-04761.

The aim of the present study was to find acoustic correlates of perceived personality from the speech produced in a formal communicative setting-that of Korean customer service employees in particular. This work extended previous research on voice personality impressions to a different sociocultural and linguistic context in which speakers are expected to speak politely in a formal register. To use naturally produced speech rather than read speech, we devised a new method that successfully elicited spontaneous speech from speakers who were role-playing as customer service employees, while controlling for the words and sentence structures they used. We then examined a wide range of acoustic properties in the utterances, including voice quality and global acoustic and segmental properties using Principal Component Analysis. Subjects of the personality rating task listened to the utterances and rated perceived personality in terms of the Big-Five personality traits. While replicating some previous findings, we discovered several acoustic variables that exclusively accounted for the personality judgments of female speakers; a more modal voice quality increased perceived conscientiousness and neuroticism, and less dispersed formants reflecting a larger body size increased the perceived levels of extraversion and openness. These biases in personality perception likely reflect gender and occupation-related stereotypes that exist in South Korea. Our findings can also serve as a basis for developing and evaluating synthetic speech for Voice Assistant applications in future studies.

RevDate: 2023-10-31

Ealer C, Niemczak CE, Nicol T, et al (2023)

Auditory neural processing in children living with HIV uncovers underlying central nervous system dysfunction.

AIDS (London, England) pii:00002030-990000000-00380 [Epub ahead of print].

OBJECTIVE: Central nervous system (CNS) damage from HIV infection or treatment can lead to developmental delays and poor educational outcomes in children living with HIV (CLWH). Early markers of central nervous system dysfunction are needed to target interventions and prevent life-long disability. The Frequency Following Response (FFR) is an auditory electrophysiology test that can reflect the health of the central nervous system. In this study, we explore whether the FFR reveals auditory central nervous system dysfunction in CLWH.

STUDY DESIGN: Cross-sectional analysis of an ongoing cohort study. Data were from the child's first visit in the study.

SETTING: The infectious disease center in Dar es Salaam, Tanzania.

METHODS: We collected the FFR from 151 CLWH and 151 HIV-negative children. To evoke the FFR, three speech syllabi (/da/, /ba/, /ga/) were played monaurally to the child's right ear. Response measures included neural timing (peak latencies), strength of frequency encoding (fundamental frequency and first formant amplitude), encoding consistency (inter-response consistency), and encoding precision (stimulus-to-response correlation).

RESULTS: CLWH showed smaller first formant amplitudes (p < .0001), weaker inter-response consistencies (p < .0001) and smaller stimulus to response correlations (p < .0001) than FFRs from HIV-negative children. These findings generalized across the three speech stimuli with moderately strong effect sizes (partial η2 ranged from 0·061 to 0·094).

CONCLUSION: The FFR shows auditory central nervous system dysfunction in CLWH. Neural encoding of auditory stimuli was less robust, more variable, and less accurate. Since the FFR is a passive and objective test, it may offer an effective way to assess and detect central nervous system function in CLWH.

RevDate: 2023-10-30

Mutlu A, Celik S, MA Kilic (2023)

Effects of Personal Protective Equipment on Speech Acoustics.

Sisli Etfal Hastanesi tip bulteni, 57(3):434-439.

OBJECTIVES: The transmission of severe acute respiratory syndrome coronavirus-2 occurs primarily through droplets, which highlights the importance of protecting the oral, nasal, and conjunctival mucosas using personal protective equipment (PPE). The use of PPE can lead to communication difficulties between healthcare workers and patients. This study aimed to investigate changes in the acoustic parameters of speech sounds when different types of PPE are used.

METHODS: A cross-sectional study was conducted, enrolling 18 healthy male and female participants. They were instructed to produce a sustained [ɑː] vowel for at least 3 s to estimate voice quality. In addition, all Turkish vowels were produced for a minimum of 200 ms. Finally, three Turkish fricative consonants ([f], [s], and [ʃ]) were produced in a consonant/vowel/consonant format with different vowel contexts within a carrier sentence. Recordings were repeated under the following conditions: no PPE, surgical mask, N99 mask, face shield, surgical mask + face shield, and N99 mask + face shield. All recordings were subjected to analysis.

RESULTS: Frequency perturbation parameters did not show significant differences. However, in males, all vowels except [u] in the first formant (F1), except [ɔ] and [u] in the second formant (F2), except [ɛ] and [ɔ] in the third formant (F3), and only [i] in the fourth formant (F4) were significant. In females, all vowels except [i] in F1, except [u] in F2, all vowels in F3, and except [u] and [ɯ] in F4 were significant. Spectral moment values exhibited significance in both groups.

CONCLUSION: The use of different types of PPE resulted in variations in speech acoustic features. These findings may be attributed to the filtering effects of PPE on specific frequencies and the potential chamber effect in front of the face. Understanding the impact of PPE on speech acoustics contributes to addressing communication challenges in healthcare settings.

RevDate: 2023-10-25

Steffman J, W Zhang (2023)

Vowel perception under prominence: Examining the roles of F0, duration, and distributional information.

The Journal of the Acoustical Society of America, 154(4):2594-2608.

This study investigates how prosodic prominence mediates the perception of American English vowels, testing the effects of F0 and duration. In Experiment 1, the perception of four vowel continua varying in duration and formants (high: /i-ɪ/, /u-ʊ/, non-high: /ɛ-ae/, /ʌ-ɑ/), was examined under changes in F0-based prominence. Experiment 2 tested if cue usage varies as the distributional informativity of duration as a cue to prominence is manipulated. Both experiments show that duration is a consistent vowel-intrinsic cue. F0-based prominence affected perception of vowels via compensation for peripheralization of prominent vowels in the vowel space. Longer duration and F0-based prominence further enhanced the perception of formant cues. The distributional manipulation in Experiment 2 exerted a minimal impact. Findings suggest that vowel perception is mediated by prominence in a height-dependent manner which reflects patterns in the speech production literature. Further, duration simultaneously serves as an intrinsic cue and serves a prominence-related function in enhancing perception of formant cues.

RevDate: 2023-10-24

Wang H, Ali Y, L Max (2023)

Perceptual formant discrimination during speech movement planning.

bioRxiv : the preprint server for biology pii:2023.10.11.561423.

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.

RevDate: 2023-10-18

Miller HE, Kearney E, Nieto-Castañón A, et al (2023)

Do Not Cut Off Your Tail: A Mega-Analysis of Responses to Auditory Perturbation Experiments.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The practice of removing "following" responses from speech perturbation analyses is increasingly common, despite no clear evidence as to whether these responses represent a unique response type. This study aimed to determine if the distribution of responses to auditory perturbation paradigms represents a bimodal distribution, consisting of two distinct response types, or a unimodal distribution.

METHOD: This mega-analysis pooled data from 22 previous studies to examine the distribution and magnitude of responses to auditory perturbations across four tasks: adaptive pitch, adaptive formant, reflexive pitch, and reflexive formant. Data included at least 150 unique participants for each task, with studies comprising younger adult, older adult, and Parkinson's disease populations. A Silverman's unimodality test followed by a smoothed bootstrap resampling technique was performed for each task to evaluate the number of modes in each distribution. Wilcoxon signed-ranks tests were also performed for each distribution to confirm significant compensation in response to the perturbation.

RESULTS: Modality analyses were not significant (p > .05) for any group or task, indicating unimodal distributions. Our analyses also confirmed compensatory reflexive responses to pitch and formant perturbations across all groups, as well as adaptive responses to sustained formant perturbations. However, analyses of sustained pitch perturbations only revealed evidence of adaptation in studies with younger adults.

CONCLUSION: The demonstration of a clear unimodal distribution across all tasks suggests that following responses do not represent a distinct response pattern, but rather the tail of a unimodal distribution.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.24282676.

RevDate: 2023-10-16

Chu M, Wang J, Fan Z, et al (2023)

A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00274-6 [Epub ahead of print].

Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.

RevDate: 2023-10-14

Santos SS, Christmann MK, CA Cielo (2023)

Spectrographic Vocal Characteristics in Female Teachers: Finger Kazoo Intensive Short-term Vocal Therapy.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00270-9 [Epub ahead of print].

OBJECTIVE: Verify the results from intensive short-term vocal therapy using the Finger Kazoo technique about the spectrographic vocal measurements of teachers.

METHODS: Controlled and randomized trial. Spectrographic vocal assessment was performed by judges before and after intensive short-term vocal therapy with Finger Kazoo. Sample was composed of 41 female teachers. There were two study groups (with vocal nodules and without structural affection of the vocal folds) and the respective control groups. For the statistical analysis of the data, nonparametric tests were used (Mann-Whitney test and Wilcoxon test).

RESULTS: After intensive short-term vocal therapy with Finger Kazoo, improvement in voice spectral parameters, such as improvement in tracing (color intensity and regularity), greater definition of formants and harmonics, increased replacement of harmonics by noise, and a greater number of harmonics, mainly in the group without structural affection of the vocal folds.

CONCLUSION: There was an improvement in the spectrographic vocal parameters, showing greater stability, quality, and projection of the emission, especially in female teachers without structural affection of the vocal folds.

RevDate: 2023-10-13

Kim JA, Jang H, Choi Y, et al (2023)

Subclinical articulatory changes of vowel parameters in Korean amyotrophic lateral sclerosis patients with perceptually normal voices.

PloS one, 18(10):e0292460 pii:PONE-D-23-09560.

The available quantitative methods for evaluating bulbar dysfunction in patients with amyotrophic lateral sclerosis (ALS) are limited. We aimed to characterize vowel properties in Korean ALS patients, investigate associations between vowel parameters and clinical features of ALS, and analyze subclinical articulatory changes of vowel parameters in those with perceptually normal voices. Forty-three patients with ALS (27 with dysarthria and 16 without dysarthria) and 20 healthy controls were prospectively collected in the study. Dysarthria was assessed using the ALS Functional Rating Scale-Revised (ALSFRS-R) speech subscores, with any loss of 4 points indicating the presence of dysarthria. The structured speech samples were recorded and analyzed using Praat software. For three corner vowels (/a/, /i/, and /u/), data on the vowel duration, fundamental frequency, frequencies of the first two formants (F1 and F2), harmonics-to-noise ratio, vowel space area (VSA), and vowel articulation index (VAI) were extracted from the speech samples. Corner vowel durations were significantly longer in ALS patients with dysarthria than in healthy controls. The F1 frequency of /a/, F2 frequencies of /i/ and /u/, the VSA, and the VAI showed significant differences between ALS patients with dysarthria and healthy controls. The area under the curve (AUC) was 0.912. The F1 frequency of /a/ and the VSA were the major determinants for differentiating ALS patients who had not yet developed apparent dysarthria from healthy controls (AUC 0.887). In linear regression analyses, as the ALSFRS-R speech subscore decreased, both the VSA and VAI were reduced. In contrast, vowel durations were found to be rather prolonged. The analyses of vowel parameters provided a useful metric correlated with disease severity for detecting subclinical bulbar dysfunction in ALS patients.

RevDate: 2023-10-13

Cai X, Ouyang M, Yin Y, et al (2023)

Sensorimotor Adaptation to Formant-Shifted Auditory Feedback Is Predicted by Language-Specific Factors in L1 and L2 Speech Production.

Language and speech [Epub ahead of print].

Auditory feedback plays an important role in the long-term updating and maintenance of speech motor control; thus, the current study explored the unresolved question of how sensorimotor adaptation is predicted by language-specific and domain-general factors in first-language (L1) and second-language (L2) production. Eighteen English-L1 speakers and 22 English-L2 speakers performed the same sensorimotor adaptation experiments and tasks, which measured language-specific and domain-general abilities. The experiment manipulated the language groups (English-L1 and English-L2) and experimental conditions (baseline, early adaptation, late adaptation, and end). Linear mixed-effects model analyses indicated that auditory acuity was significantly associated with sensorimotor adaptation in L1 and L2 speakers. Analysis of vocal responses showed that L1 speakers exhibited significant sensorimotor adaptation under the early adaptation, late adaptation, and end conditions, whereas L2 speakers exhibited significant sensorimotor adaptation only under the late adaptation condition. Furthermore, the domain-general factors of working memory and executive control were not associated with adaptation/aftereffects in either L1 or L2 production, except for the role of working memory in aftereffects in L2 production. Overall, the study empirically supported the hypothesis that sensorimotor adaptation is predicted by language-specific factors such as auditory acuity and language experience, whereas general cognitive abilities do not play a major role in this process.

RevDate: 2023-10-12

Geng P, Fan N, Ling R, et al (2023)

Acoustic Characteristics of Mandarin Speech in Male Drug Users.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00269-2 [Epub ahead of print].

AIM: Drug use/addiction has a profound impact on the physical and mental health of individuals. Previous studies have indicated that drug users may experience speech perception disorders, including speech illusion and challenges in recognizing emotional speech. However, the influence of drugs on speech production, as another crucial aspect of speech communication, has not been thoroughly examined. Therefore, the current study aimed to investigate how drugs affect the acoustic characteristics of speech in Chinese male drug users.

METHOD: Speech recordings were collected from a total of 160 male drug users (including 106 heroin users, 23 ketamine users, and 31 methamphetamine users) and 55 male healthy controls with no history of drug use. Acoustic analysis was conducted on the collected speech data from these groups, and classification analysis was performed using five supervised learning algorithms.

RESULTS: The results demonstrated that drug users exhibited smaller F0 standard deviation, reduced loudness, cepstral peak prominence, and formant relative energies, as well as higher H1-A3, longer unvoiced segments, and fewer voiced segments per second compared to the control group. The classification analyses yielded good performance in classifying drug users and non-drug users, with an accuracy above 86%. Moreover, the identification of the three groups of drug users achieved an accuracy of approximately 70%. Additionally, the study revealed different effects on speech production among the three types of drugs.

CONCLUSION: The above findings indicate the presence of speech disorders, such as vocal hoarseness, in drug users, thus confirming the assumption that the acoustic characteristics of speech in drug users deviates from the norm. This study not only fills the knowledge gap regarding the effects of drugs on the speech production of Chinese male drug users but also provides a more comprehensive understanding of how drugs impact human behaviors. Furthermore, this research provides theoretical foundations of detoxification and speech rehabilitation for drug users.

RevDate: 2023-10-11

Favaro L, Zanoli A, Ludynia K, et al (2023)

Vocal tract shape variation contributes to individual vocal identity in African penguins.

Proceedings. Biological sciences, 290(2008):20231029.

Variation in formant frequencies has been shown to affect social interactions and sexual competition in a range of avian species. Yet, the anatomical bases of this variation are poorly understood. Here, we investigated the morphological correlates of formants production in the vocal apparatus of African penguins. We modelled the geometry of the supra-syringeal vocal tract of 20 specimens to generate a population of virtual vocal tracts with varying dimensions. We then estimated the acoustic response of these virtual vocal tracts and extracted the centre frequency of the first four predicted formants. We demonstrate that: (i) variation in length and cross-sectional area of vocal tracts strongly affects the formant pattern, (ii) the tracheal region determines most of this variation, and (iii) the skeletal size of penguins does not correlate with the trachea length and consequently has relatively little effect on formants. We conclude that in African penguins, while the variation in vocal tract geometry generates variation in resonant frequencies supporting the discrimination of conspecifics, such variation does not provide information on the emitter's body size. Overall, our findings advance our understanding of the role of formant frequencies in bird vocal communication.

RevDate: 2023-10-09

de Boer MM, WFL Heeren (2023)

The language dependency of /m/ in native Dutch and non-native English.

The Journal of the Acoustical Society of America, 154(4):2168-2176.

In forensic speaker comparisons, the current practice is to try to avoid comparisons between speech fragments in different languages. However, globalization requires an exploration of individual speech features that may show phonetic consistency across a speaker's languages. We predicted that the bilabial nasal /m/ may be minimally affected by the language spoken due to the involvement of the rigid nasal cavity in combination with a lack of fixed oral articulatory targets. The results show that indeed, L1 Dutch speakers (N = 53) had similar nasal formants and formant bandwidths when speaking in their L2 English as in their native language, suggesting language-independency of /m/ within speakers. In fact, acoustics seemed to rely more on the phonetic context than on the language spoken. Nevertheless, caution should still be exercised when sampling across languages when the languages' phoneme inventories and phonotactics show substantial differences.

RevDate: 2023-10-09

Meng Z, Liu H, AC Ma (2023)

Optimizing Voice Recognition Informatic Robots for Effective Communication in Outpatient Settings.

Cureus, 15(9):e44848.

Aim/Objective Within the dynamic healthcare technology landscape, this research aims to explore patient inquiries within outpatient clinics, elucidating the interplay between technology and healthcare intricacies. Building upon the initial intelligent guidance robot implementation shortcomings, this investigation seeks to enhance informatic robots with voice recognition technology. The objective is to analyze users' vocal patterns, discern age-associated vocal attributes, and facilitate age differentiation through subtle vocal nuances to enhance the efficacy of human-robot communication within outpatient clinical settings. Methods This investigation employs a multi-faceted approach. It leverages voice recognition technology to analyze users' vocal patterns. A diverse dataset of voice samples from various age groups was collected. Acoustic features encompassing pitch, formant frequencies, spectral characteristics, and vocal tract length are extracted from the audio samples. The Mel Filterbank and Mel-Frequency Cepstral Coefficients (MFCCs) are employed for speech and audio processing tasks alongside machine learning algorithms to assess and match vocal patterns to age-related traits. Results The research reveals compelling outcomes. The incorporation of voice recognition technology contributes to a significant improvement in human-robot communication within outpatient clinical settings. Through accurate analysis of vocal patterns and age-related traits, informatic robots can differentiate age through nuanced verbal cues. This augmentation leads to enhanced contextual understanding and tailored responses, significantly advancing the efficiency of patient interactions with the robots. Conclusion Integrating voice recognition technology into informatic robots presents a noteworthy advancement in outpatient clinic settings. By enabling age differentiation through vocal nuances, this augmentation enhances the precision and relevance of responses. The study contributes to the ongoing discourse on the dynamic evolution of healthcare technology, underscoring the complex synergy between technological progression and the intricate realities within healthcare infrastructure. As healthcare continues to metamorphose, the seamless integration of voice recognition technology marks a pivotal stride in optimizing human-robot communication and elevating patient care within outpatient settings.

RevDate: 2023-10-04

Mohn JL, Baese-Berk MM, S Jaramillo (2023)

Selectivity to acoustic features of human speech in the auditory cortex of the mouse.

bioRxiv : the preprint server for biology pii:2023.09.20.558699.

A better understanding of the neural mechanisms of speech processing can have a major impact in the development of strategies for language learning and in addressing disorders that affect speech comprehension. Technical limitations in research with human subjects hinder a comprehensive ex-ploration of these processes, making animal models essential for advancing the characterization of how neural circuits make speech perception possible. Here, we investigated the mouse as a model organism for studying speech processing and explored whether distinct regions of the mouse auditory cortex are sensitive to specific acoustic features of speech. We found that mice can learn to categorize frequency-shifted human speech sounds based on differences in formant transitions (FT) and voice onset time (VOT). Moreover, neurons across various auditory cortical regions were selective to these speech features, with a higher proportion of speech-selective neurons in the dorso-posterior region. Last, many of these neurons displayed mixed-selectivity for both features, an attribute that was most common in dorsal regions of the auditory cortex. Our results demonstrate that the mouse serves as a valuable model for studying the detailed mechanisms of speech feature encoding and neural plasticity during speech-sound learning.

RevDate: 2023-10-03

Sant'Anna LIDA, Miranda E Paulo D, Baião FCS, et al (2023)

Can rapid maxillary expansion affect speech sound production in growing patients? A systematic review.

Orthodontics & craniofacial research [Epub ahead of print].

Rapid maxillary expansion (RME) may change speech sound parameters due to the enlargement of oral and nasal cavities. This study aimed to systematically review the current evidence on speech changes as a side effect of RME. An electronic search was conducted in nine databases, and two of them accessed the 'grey literature'. The eligibility criteria included clinical studies assessing orthodontic patients with maxillary transverse deficiency and the relationship with speech alterations without restricting publication year or language. Only interventional studies were included. The JBI Critical Appraisal Tool assessed the risk of bias. The initial search provided 4853 studies. Seven articles (n = 200 patients) met the inclusion criteria and were analysed. The primary source of bias was the absence of a control group in four studies. RME altered speech production by changing vowel fundamental frequency and fricative phoneme formant frequency. Shimmer and jitter rates changed in one and two studies, respectively. Two studies presented deterioration during orthodontic treatment, but speech improved after appliance removal. Despite the limited evidence, RME affects speech during and after treatment.

RevDate: 2023-10-01

Grawunder S, Uomini N, Samuni L, et al (2023)

Correction: 'Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage' (2021), by Grawunder et al.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 378(1890):20230319.

RevDate: 2023-09-28

van Brenk F, Lowit A, K Tjaden (2023)

Effects of Speaking Rate on Variability of Second Formant Frequency Transitions in Dysarthria.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000534337 [Epub ahead of print].

INTRODUCTION: This study examined the utility of multiple second formant (F2) slope metrics to capture differences in speech production for individuals with dysarthria and healthy controls as a function of speaking rate. In addition, the utility of F2 slope metrics for predicting severity of intelligibility impairment in dysarthria was examined.

METHODS: 23 speakers with Parkinson's disease and mild to moderate hypokinetic dysarthria (HD), 9 speakers with various neurological diseases and mild to severe ataxic dysarthria (AD), and 26 age-matched healthy control speakers (CON) participated in a sentence repetition task. Sentences were produced at habitual, fast, and slow speaking rate. A variety of metrics were derived from the rising second formant (F2) transition portion of the diphthong /ai/. To obtain measures of intelligibility for the two clinical speaker groups, 15 undergraduate SLP students participated in a transcription experiment.

RESULTS: Significantly shallower slopes were found for the speakers with hypokinetic dysarthria compared to control speakers. Steeper F2 slopes were associated with increased speaking rate for all groups. Higher variability in F2 slope metrics was found for the speakers with ataxic dysarthria compared to the two other speaker groups. For both clinical speaker groups, there was a negative association between intelligibility and F2 slope variability metrics, indicating lower variability in speech production was associated with higher intelligibility.

DISCUSSION: F2 slope metrics were sensitive to dysarthria presence, dysarthria type and speaking rate. The current study provided evidence that the use of F2 slope variability measures has additional value to F2 slope averaged measures for predicting severity of intelligibility impairment in dysarthria.

RevDate: 2023-09-27

Liu W, Wang T, X Huang (2023)

The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation?.

The Journal of the Acoustical Society of America, 154(3):1903-1920.

The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.

RevDate: 2023-09-25

Toppo R, S Sinha (2023)

The Acoustics of Gender in Indian English: Toward Forensic Profiling in a Multilingual Context.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00239-4 [Epub ahead of print].

The present study is an acoustic analysis of Indian English, specifically examining the speech patterns and characteristics of three different groups with different native languages. This study investigates fundamental frequency (fo), fo range, fo variance, formant frequencies, and vowel space size in 42 native male and female speakers of Odia, Bangla, and Hindi. Furthermore, it investigated the potential correlation between fundamental frequency and vowel space, examining whether variations in vowel space size could be influenced by gender-specific perceptual factors. The paper emphasizes that in a multilingual context, gender identification can be efficiently correlated with both fo and formant frequencies. To measure a range of acoustic characteristics, speech samples were collected from the recording task. Analysis was done on PRAAT. The study revealed significant differences between genders for the examined acoustic characteristics. Results indicate differences in the size of gender-specific variations among the language groups, with females exhibiting more significant differences in fo, formant frequencies, and vowel space than males. The findings show no significant correlation between fo and vowel space area, indicating that other features are responsible for large vowel space for females. These findings display significant potential toward creating a robust empirical framework for gender profiling that can be utilized in a wide range of forensic linguistics investigations.

RevDate: 2023-09-22

Osiecka AN, Briefer EF, Kidawa D, et al (2023)

Social calls of the little auk (Alle alle) reflect body size and possibly partnership, but not sex.

Royal Society open science, 10(9):230845.

Source-filter theory posits that an individual's size and vocal tract length are reflected in the parameters of their calls. In species that mate assortatively, this could result in vocal similarity. In the context of mate selection, this would mean that animals could listen in to find a partner that sounds-and therefore is-similar to them. We investigated the social calls of the little auk (Alle alle), a highly vocal seabird mating assortatively, using vocalizations produced inside 15 nests by known individuals. Source- and filter-related acoustic parameters were used in linear mixed models testing the possible impact of body size. A principal component analysis followed by a permuted discriminant function analysis tested the effect of sex. Additionally, randomization procedures tested whether partners are more vocally similar than random birds. There was a significant effect of size on the mean fundamental frequency of a simple call, but not on parameters of a multisyllable call with apparent formants. Neither sex nor partnership influenced the calls-there was, however, a tendency to match certain parameters between partners. This indicates that vocal cues are at best weak indicators of size, and other factors likely play a role in mate selection.

RevDate: 2023-09-20

Georgiou GP (2023)

Comparison of the prediction accuracy of machine learning algorithms in crosslinguistic vowel classification.

Scientific reports, 13(1):15594.

Machine learning algorithms can be used for the prediction of nonnative sound classification based on crosslinguistic acoustic similarity. To date, very few linguistic studies have compared the classification accuracy of different algorithms. This study aims to assess how well machines align with human speech perception by assessing the ability of three machine learning algorithms, namely, linear discriminant analysis (LDA), decision tree (C5.0), and neural network (NNET), to predict the classification of second language (L2) sounds in terms of first language (L1) categories. The models were trained using the first three formants and duration of L1 vowels and fed with the same acoustic features of L2 vowels. To validate their accuracy, adult L2 speakers completed a perceptual classification task. The results indicated that NNET predicted with success the classification of all L2 vowels with the highest proportion in terms of L1 categories, while LDA and C5.0 missed only one vowel each. Furthermore, NNET exhibited superior accuracy in predicting the full range of above chance responses, followed closely by LDA. C5.0 did not meet the anticipated performance levels. The findings can hold significant implications for advancing both the theoretical and practical frameworks of speech acquisition.

RevDate: 2023-09-17

Zhang T, Liu X, Liu G, et al (2023)

PVR-AFM: A Pathological Voice Repair System based on Non-linear Structure.

Journal of voice : official journal of the Voice Foundation, 37(5):648-662.

OBJECTIVE: Speech signal processing has become an important technique to ensure that the voice interaction system communicates accurately with the user by improving the clarity or intelligibility of speech signals. However, most existing works only focus on whether to process the voice of average human but ignore the communication needs of individuals suffering from voice disorder, including voice-related professionals, older people, and smokers. To solve this demand, it is essential to design a non-invasive repair system that processes pathological voices.

METHODS: In this paper, we propose a repair system for multiple polyp vowels, such as /a/, /i/ and /u/. We utilize a non-linear model based on amplitude-modulation (AM) and a frequency-modulation (FM) structure to extract the pitch and formant of pathological voice. To solve the fracture and instability of pitch, we provide a pitch extraction algorithm, which ensures that pitch's stability and avoids the errors of double pitch caused by the instability of low-frequency signal. Furthermore, we design a formant reconstruction mechanism, which can effectively determine the frequency and bandwidth to accomplish formant repair.

RESULTS: Finally, spectrum observation and objective indicators show that the system has better performance in improving the intelligibility of pathological speech.

RevDate: 2023-09-13

Roland V, Huet K, Harmegnies B, et al (2023)

Vowel production: a potential speech biomarker for early detection of dysarthria in Parkinson's disease.

Frontiers in psychology, 14:1129830.

OBJECTIVES: Our aim is to detect early, subclinical speech biomarkers of dysarthria in Parkinson's disease (PD), i.e., systematic atypicalities in speech that remain subtle, are not easily detectible by the clinician, so that the patient is labeled "non-dysarthric." Based on promising exploratory work, we examine here whether vowel articulation, as assessed by three acoustic metrics, can be used as early indicator of speech difficulties associated with Parkinson's disease.

STUDY DESIGN: This is a prospective case-control study.

METHODS: Sixty-three individuals with PD and 35 without PD (healthy controls-HC) participated in this study. Out of 63 PD patients, 43 had been diagnosed with dysarthria (DPD) and 20 had not (NDPD). Sustained vowels were recorded for each speaker and formant frequencies were measured. The analyses focus on three acoustic metrics: individual vowel triangle areas (tVSA), vowel articulation index (VAI) and the Phi index.

RESULTS: tVSA were found to be significantly smaller for DPD speakers than for HC. The VAI showed significant differences between these two groups, indicating greater centralization and lower vowel contrasts in the DPD speakers with dysarhtria. In addition, DPD and NDPD speakers had lower Phi values, indicating a lower organization of their vowel system compared to the HC. Results also showed that the VAI index was the most efficient to distinguish between DPD and NDPD whereas the Phi index was the best acoustic metric to discriminate NDPD and HC.

CONCLUSION: This acoustic study identified potential subclinical vowel-related speech biomarkers of dysarthria in speakers with Parkinson's disease who have not been diagnosed with dysarthria.

RevDate: 2023-09-11

Perrine BL, RC Scherer (2023)

Using a vertical three-mass computational model of the vocal folds to match human phonation of three adult males.

The Journal of the Acoustical Society of America, 154(3):1505-1525.

Computer models of phonation are used to study various parameters that are difficult to control, measure, and observe in human subjects. Imitating human phonation by varying the prephonatory conditions of computer models offers insight into the variations that occur across human phonatory production. In the present study, a vertical three-mass computer model of phonation [Perrine, Scherer, Fulcher, and Zhai (2020). J. Acoust. Soc. Am. 147, 1727-1737], driven by empirical pressures from a physical model of the vocal folds (model M5), with a vocal tract following the design of Ishizaka and Flanagan [(1972). Bell Sys. Tech. J. 51, 1233-1268] was used to match prolonged vowels produced by three male subjects using various pitch and loudness levels. The prephonatory conditions of tissue mass and tension, subglottal pressure, glottal diameter and angle, posterior glottal gap, false vocal fold gap, and vocal tract cross-sectional areas were varied in the model to match the model output with the fundamental frequency, alternating current airflow, direct current airflow, skewing quotient, open quotient, maximum flow negative derivative, and the first three formant frequencies from the human production. Parameters were matched between the model and human subjects with an average overall percent mismatch of 4.40% (standard deviation = 6.75%), suggesting a reasonable ability of the simple low dimensional model to mimic these variables.

RevDate: 2023-08-31

Steffman J (2023)

Vowel-internal cues to vowel quality and prominence in speech perception.

Phonetica [Epub ahead of print].

This study examines how variation in F0 and intensity impacts the perception of American English vowels. Both properties vary intrinsically as a function of vowel features in the speech production literature, raising the question of the perceptual impact of each. In addition to considering listeners' interpretation of either cue as an intrinsic property of the vowel, the possible prominence-marking function of each is considered. Two patterns of prominence strengthening in vowels, sonority expansion and hyperarticulation, are tested in light of recent findings that contextual prominence impacts vowel perception in line with these effects (i.e. a prominent vowel is expected by listeners to be realized as if it had undergone prominence strengthening). Across four vowel contrasts with different height and frontness features, listeners categorized phonetic continua with variation in formants, F0 and intensity. Results show that variation in level F0 height is interpreted as an intrinsic cue by listeners. Higher F0 cues a higher vowel, following intrinsic F0 effects in the production literature. In comparison, intensity is interpreted as a prominence-lending cue, for which effect directionality is dependent on vowel height. Higher intensity high vowels undergo perceptual re-calibration in line with (acoustic) hyperarticulation, whereas higher intensity non-high vowels undergo perceptual re-calibration in line with sonority expansion.

RevDate: 2023-08-26

Yang J, Yue Y, Lv H, et al (2023)

Effect of Adding Intermediate Layers on the Interface Bonding Performance of WC-Co Diamond-Coated Cemented Carbide Tool Materials.

Molecules (Basel, Switzerland), 28(16): pii:molecules28165958.

The interface models of diamond-coated WC-Co cemented carbide (DCCC) were constructed without intermediate layers and with different interface terminals, such as intermediate layers of TiC, TiN, CrN, and SiC. The adhesion work of the interface model was calculated based on the first principle. The results show that the adhesion work of the interface was increased after adding four intermediate layers. Their effect on improving the interface adhesion performance of cemented carbide coated with diamond was ranked in descending order as follows: SiC > CrN > TiC > TiN. The charge density difference and the density of states were further analyzed. After adding the intermediate layer, the charge distribution at the interface junction was changed, and the electron cloud at the interface junction overlapped to form a more stable chemical bond. Additionally, after adding the intermediate layer, the density of states of the atoms at the interface increased in the energy overlapping area. The formant formed between the electronic orbitals enhances the bond strength. Thus, the interface bonding performance of DCCC was enhanced. Among them, the most obvious was the interatomic electron cloud overlapping at the diamond/SiCC-Si/WC-Co interface, its bond length was the shortest (1.62 Å), the energy region forming the resonance peak was the largest (-5-20 eV), and the bonding was the strongest. The interatomic bond length at the diamond/TiNTi/WC-Co interface was the longest (4.11 Å), the energy region forming the resonance peak was the smallest (-5-16 eV), and the bonding was the weakest. Comprehensively considering four kinds of intermediate layers, the best intermediate layer for improving the interface bonding performance of DCCC was SiC, and the worst was TiN.

RevDate: 2023-08-24

Bradshaw AR, Lametti DR, Shiller DM, et al (2023)

Speech motor adaptation during synchronous and metronome-timed speech.

Journal of experimental psychology. General pii:2024-01928-001 [Epub ahead of print].

Sensorimotor integration during speech has been investigated by altering the sound of a speaker's voice in real time; in response, the speaker learns to change their production of speech sounds in order to compensate (adaptation). This line of research has however been predominantly limited to very simple speaking contexts, typically involving (a) repetitive production of single words and (b) production of speech while alone, without the usual exposure to other voices. This study investigated adaptation to a real-time perturbation of the first and second formants during production of sentences either in synchrony with a prerecorded voice (synchronous speech group) or alone (solo speech group). Experiment 1 (n = 30) found no significant difference in the average magnitude of compensatory formant changes between the groups; however, synchronous speech resulted in increased between-individual variability in such formant changes. Participants also showed acoustic-phonetic convergence to the voice they were synchronizing with prior to introduction of the feedback alteration. Furthermore, the extent to which the changes required for convergence agreed with those required for adaptation was positively correlated with the magnitude of subsequent adaptation. Experiment 2 tested an additional group with a metronome-timed speech task (n = 15) and found a similar pattern of increased between-participant variability in formant changes. These findings demonstrate that speech motor adaptation can be measured robustly at the group level during performance of more complex speaking tasks; however, further work is needed to resolve whether self-voice adaptation and other-voice convergence reflect additive or interactive effects during sensorimotor control of speech. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

RevDate: 2023-08-17

Ancel EE, Smith ML, Rao VNV, et al (2023)

Relating Acoustic Measures to Listener Ratings of Children's Productions of Word-Initial /ɹ/ and /w/.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The /ɹ/ productions of young children acquiring American English are highly variable and often inaccurate, with [w] as the most common substitution error. One acoustic indicator of the goodness of children's /ɹ/ productions is the difference between the frequency of the second formant (F2) and the third formant (F3), with a smaller F3-F2 difference being associated with a perceptually more adultlike /ɹ/. This study analyzed the effectiveness of automatically extracted F3-F2 differences in characterizing young children's productions of /ɹ/-/w/ in comparison with manually coded measurements.

METHOD: Automated F3-F2 differences were extracted from productions of a variety of different /ɹ/- and /w/-initial words spoken by 3- to 4-year-old monolingual preschoolers (N = 117; 2,278 tokens in total). These automated measures were compared to ratings of the phoneme goodness of children's productions as rated by untrained adult listeners (n = 132) on a visual analog scale, as well as to narrow transcriptions of the production into four categories: [ɹ], [w], and two intermediate categories.

RESULTS: Data visualizations show a weak relationship between automated F3-F2 differences with listener ratings and narrow transcriptions. Mixed-effects models suggest the automated F3-F2 difference only modestly predicts listener ratings (R [2] = .37) and narrow transcriptions (R [2] = .32).

CONCLUSION: The weak relationship between automated F3-F2 difference and both listener ratings and narrow transcriptions suggests that these automated acoustic measures are of questionable reliability and utility in assessing preschool children's mastery of the /ɹ/-/w/ contrast.

RevDate: 2023-08-09

Stilp C, E Chodroff (2023)

"Please say what this word is": Linguistic experience and acoustic context interact in vowel categorization .

JASA express letters, 3(8):.

Ladefoged and Broadbent [(1957). J. Acoust. Soc. Am. 29(1), 98-104] is a foundational study in speech perception research, demonstrating that acoustic properties of earlier sounds alter perception of subsequent sounds: a context sentence with a lowered first formant (F1) frequency promotes perception of a raised F1 in a target word, and vice versa. The present study replicated the original with U.K. and U.S. listeners. While the direction of the perceptual shift was consistent with the original study, neither sample replicated the large effect sizes. This invites consideration of how linguistic experience relates to the magnitudes of these context effects.

RevDate: 2023-08-09

Tanner J (2023)

Prosodic and durational influences on the formant dynamics of Japanese vowels.

JASA express letters, 3(8):.

The relationship between prosodic structure and segmental realisation is a central question within phonetics. For vowels, this has been typically examined in terms of duration, leaving largely unanswered how prosodic boundaries influence spectral realisation. This study examines the influence of prosodic boundary strength-as well as duration and pauses-on vowel dynamics in spontaneous Japanese. While boundary strength has a marginal effect on dynamics, increased duration and pauses result in greater vowel peripherality and spectral change. These findings highlight the complex relationship between prosodic and segmental structure, and illustrate the importance of multifactorial analysis in corpus research.

RevDate: 2023-08-07

Hilger A, Cole J, C Larson (2023)

Task-dependent pitch auditory feedback control in cerebellar ataxia.

Research square pii:rs.3.rs-3186155.

Purpose The purpose of this study was to investigate how ataxia affects the task-dependent role of pitch auditory feedback control in speech. In previous research, individuals with ataxia produced over-corrected, hypermetric compensatory responses to unexpected pitch and formant frequency perturbations in auditory feedback in sustained vowels and single words (Houde et al., 2019; Li et al., 2019; Parrell et al., 2017). In this study, we investigated whether ataxia would also affect the task-dependent role of the auditory feedback control system, measuring whether pitch-shift responses would be mediated by speech task or semantic focus pattern as they are in neurologically healthy speakers. Methods Twenty-two adults with ataxia and 29 age- and sex-matched control participants produced sustained vowels and sentences with and without corrective focus while their auditory feedback was briefly and unexpectedly perturbed in pitch by +/-200 cents. The magnitude and latency of the reflexive pitch-shift responses were measured as a reflection of auditory feedback control. Results Individuals with ataxia produced larger reflexive pitch-shift responses in both the sustained-vowel and sentence-production tasks than the control participants. Additionally, a differential response magnitude was observed by task and sentence focus pattern for both groups. Conclusion These findings demonstrate that even though accuracy of auditory feedback control correction is affected by cerebellar damage, as evidenced by the hypermetric responses, the system still retains efficiency in utilizing the task-dependent role of auditory feedback.

RevDate: 2023-08-04

Gao Y, Feng Y, Wu D, et al (2023)

Effect of Wearing Different Masks on Acoustic, Aerodynamic, and Formant Parameters.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00191-1 [Epub ahead of print].

OBJECTIVE: This study aimed to investigate the effects of different types of masks on acoustic, aerodynamic, and formant parameters in healthy people.

METHODS: Our study involved 30 healthy participants, 15 of each gender, aged 20-40 years. The tests were conducted under four conditions: without a mask, after wearing a surgical mask, after wearing a head-mounted N95 mask, and after wearing an ear-mounted N95 mask. Voice recording was done with the mask on. The acoustic parameters include mean fundamental frequency (F0), mean intensity, percentage of jitter (local), percentage of shimmer (local), mean noise to harmonic ratio (NHR), aerodynamic parameter, maximum phonation time (MPT), and formant parameters (/a/, /i/, /u/ three vowels F1, F2).

RESULTS: The main effect of mask type was significant in MPT, mean F0, mean HNR, /a/F1, /a/F2, /i/F2. However, the effect sizes and power in /a/F2, /i/F2 were low. MPT, mean F0 and mean HNR significantly increased and /a/F1 significantly decreased after wearing the head-mounted n95 mask. The mean F0 and mean HNR increased significantly after wearing the ear-mounted n95 mask. No significant changes were observed in parameters after wearing the surgical mask in this study. When the statistics are performed separately for males and females, the results obtained are similar to those previously obtained for unspecified males and females.

CONCLUSION: After wearing the surgical mask, this study found insignificant changes in mean F0, jitter (local), shimmer (local), mean NHR, mean intensity, MPT, and the vowels F1 and F2. This may be due to the looser design of the surgical mask and the relatively small attenuation of sound. N95 masks have a greater effect on vocalization than surgical masks and may cause changes in F0 and HNR after wearing an N95 mask. In the present study, no significant changes in jitter and shimmer were observed after wearing the mask. In addition, there was a significant reduction in /a/F1 after wearing the N95 headgear mask may owing to its high restriction of jaw mobility. In future studies, the change in jaw movement amplitude after wearing the mouthpiece can be added to investigate.

RevDate: 2023-07-31

Rizzi R, GM Bidelman (2023)

Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech.

Cerebral cortex (New York, N.Y. : 1991) pii:7233661 [Epub ahead of print].

So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- versus high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" versus "ga." The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.

RevDate: 2023-07-28

Kim KS, Gaines JL, Parrell B, et al (2023)

Mechanisms of sensorimotor adaptation in a hierarchical state feedback control model of speech.

PLoS computational biology, 19(7):e1011244 pii:PCOMPBIOL-D-22-01400 [Epub ahead of print].

Upon perceiving sensory errors during movements, the human sensorimotor system updates future movements to compensate for the errors, a phenomenon called sensorimotor adaptation. One component of this adaptation is thought to be driven by sensory prediction errors-discrepancies between predicted and actual sensory feedback. However, the mechanisms by which prediction errors drive adaptation remain unclear. Here, auditory prediction error-based mechanisms involved in speech auditory-motor adaptation were examined via the feedback aware control of tasks in speech (FACTS) model. Consistent with theoretical perspectives in both non-speech and speech motor control, the hierarchical architecture of FACTS relies on both the higher-level task (vocal tract constrictions) as well as lower-level articulatory state representations. Importantly, FACTS also computes sensory prediction errors as a part of its state feedback control mechanism, a well-established framework in the field of motor control. We explored potential adaptation mechanisms and found that adaptive behavior was present only when prediction errors updated the articulatory-to-task state transformation. In contrast, designs in which prediction errors updated forward sensory prediction models alone did not generate adaptation. Thus, FACTS demonstrated that 1) prediction errors can drive adaptation through task-level updates, and 2) adaptation is likely driven by updates to task-level control rather than (only) to forward predictive models. Additionally, simulating adaptation with FACTS generated a number of important hypotheses regarding previously reported phenomena such as identifying the source(s) of incomplete adaptation and driving factor(s) for changes in the second formant frequency during adaptation to the first formant perturbation. The proposed model design paves the way for a hierarchical state feedback control framework to be examined in the context of sensorimotor adaptation in both speech and non-speech effector systems.

RevDate: 2023-08-04
CmpDate: 2023-08-04

Illner V, Tykalova T, Skrabal D, et al (2023)

Automated Vowel Articulation Analysis in Connected Speech Among Progressive Neurological Diseases, Dysarthria Types, and Dysarthria Severities.

Journal of speech, language, and hearing research : JSLHR, 66(8):2600-2621.

PURPOSE: Although articulatory impairment represents distinct speech characteristics in most neurological diseases affecting movement, methods allowing automated assessments of articulation deficits from the connected speech are scarce. This study aimed to design a fully automated method for analyzing dysarthria-related vowel articulation impairment and estimate its sensitivity in a broad range of neurological diseases and various types and severities of dysarthria.

METHOD: Unconstrained monologue and reading passages were acquired from 459 speakers, including 306 healthy controls and 153 neurological patients. The algorithm utilized a formant tracker in combination with a phoneme recognizer and subsequent signal processing analysis.

RESULTS: Articulatory undershoot of vowels was presented in a broad spectrum of progressive neurodegenerative diseases, including Parkinson's disease, progressive supranuclear palsy, multiple-system atrophy, Huntington's disease, essential tremor, cerebellar ataxia, multiple sclerosis, and amyotrophic lateral sclerosis, as well as in related dysarthria subtypes including hypokinetic, hyperkinetic, ataxic, spastic, flaccid, and their mixed variants. Formant ratios showed a higher sensitivity to vowel deficits than vowel space area. First formants of corner vowels were significantly lower for multiple-system atrophy than cerebellar ataxia. Second formants of vowels /a/ and /i/ were lower in ataxic compared to spastic dysarthria. Discriminant analysis showed a classification score of up to 41.0% for disease type, 39.3% for dysarthria type, and 49.2% for dysarthria severity. Algorithm accuracy reached an F-score of 0.77.

CONCLUSIONS: Distinctive vowel articulation alterations reflect underlying pathophysiology in neurological diseases. Objective acoustic analysis of vowel articulation has the potential to provide a universal method to screen motor speech disorders.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.23681529.

RevDate: 2023-07-28

Mailhos A, Egea-Caparrós DA, Cabana Á, et al (2023)

Voice pitch is negatively associated with sociosexual behavior in males but not in females.

Frontiers in psychology, 14:1200065.

Acoustic cues play a major role in social interactions in many animal species. In addition to the semantic contents of human speech, voice attributes - e.g., voice pitch, formant position, formant dispersion, etc. - have been proposed to provide critical information for the assessment of potential rivals and mates. However, prior studies exploring the association of acoustic attributes with reproductive success, or some of its proxies, have produced mixed results. Here, we investigate whether the mean fundamental frequency (F0), formant position (Pf), and formant dispersion (Df) - dimorphic attributes of the human voice - are related to sociosexuality, as measured by the Revised Sociosexual Orientation Inventory (SOI-R) - a trait also known to exhibit sex differences - in a sample of native Spanish-speaking students (101 males, 147 females). Analyses showed a significant negative correlation between F0 and sociosexual behavior, and between Pf and sociosexual desire in males but not in females. These correlations remained significant after correcting for false discovery rate (FDR) and controlling for age, a potential confounding variable. Our results are consistent with a role of F0 and Pf serving as cues in the mating domain in males but not in females. Alternatively, the association of voice attributes and sociosexual orientation might stem from the parallel effect of male sex hormones both on the male brain and the anatomical structures involved in voice production.

RevDate: 2023-07-21

González-Alvarez J, R Sos-Peña (2023)

Body Perception From Connected Speech: Speaker Height Discrimination from Natural Sentences and Sine-Wave Replicas with and without Pitch.

Perceptual and motor skills, 130(4):1353-1365.

In addition to language, the human voice carries information about the physical characteristics of speakers, including their body size (height and weight). The fundamental speaking frequency, perceived as voice pitch, and the formant frequencies, or resonators of the vocal tract, are the acoustic speech parameters that have been most intensely studied for perceiving a speaker's body size. In this study, we created sine-wave (SW) replicas of connected speech (sentences) uttered by 20 male and 20 female speakers, consisting of three time-varying sinusoidal waves matching the frequency pattern of the first three formants of each sentence. These stimuli only provide information about the formant frequencies of a speech signal. We also created a new experimental condition by adding a sinusoidal replica of the voice pitch of each sentence. Results obtained from a binary discrimination task revealed that (a) our SW replicas provided sufficient useful information to accurately judge the speakers' body height at an above chance level; (b) adding the sinusoidal replica about the voice pitch did not significantly increase accuracy; and (c) stimuli from female speakers were more informative for body height detection and allowed higher perceptual accuracy, due to a stronger correlation between formant frequencies and actual body height than stimuli from male speakers.

RevDate: 2023-07-19

Vilanova ID, Almeida SB, de Araújo VS, et al (2023)

Impact of orthognathic surgery on voice and speech: a systematic review and meta-analysis.

European journal of orthodontics pii:7226525 [Epub ahead of print].

BACKGROUND: Orthognathic surgical procedures, whether in one or both jaws, can affect structures regarding the articulation and resonance of voice and speech.

OBJECTIVE: Evaluating the impact of orthognathic surgery on voice and speech performance in individuals with skeletal dentofacial disharmony.

SEARCH METHODS: Word combinations and truncations were adapted for the following electronic databases: EMBASE, PubMed/Medline, Scopus, Web of Science, Cochrane Library, and Latin American and Caribbean Literature in Health Sciences (LILACS), and grey literature.

SELECTION CRITERIA: The research included studies on nonsyndromic adults with skeletal dentofacial disharmony undergoing orthognathic surgery. These studies assessed patients before and after surgery or compared them with individuals with good facial harmony using voice and speech parameters through validated protocols.

DATA COLLECTION AND ANALYSIS: Two independent reviewers performed all stages of the review. The Joanna Briggs Institute tool was used to assess risk of bias in the cohort studies, and ROBINS-I was used for nonrandomized clinical trials. The authors also performed a meta-analysis of random effects.

RESULTS: A total of 1163 articles were retrieved after the last search, of which 23 were read in full. Of these, four were excluded, totalling 19 articles for quantitative synthesis. When comparing the pre- and postoperative periods, both for fundamental frequency, formants, and jitter and shimmer perturbation measures, orthognathic surgery did not affect vowel production. According to the articles, the main articulatory errors associated with skeletal dentofacial disharmonies prior to surgery were distortions of fricative sounds, mainly/s/ and/z/.

CONCLUSIONS: Orthognathic surgery may have little or no impact on vocal characteristics during vowel production. However, due to the confounding factors involved, estimates are inconclusive. The most prevalent articulatory disorders in the preoperative period were distortion of the fricative phonemes/s/ and/z/. However, further studies must be carried out to ensure greater robustness to these findings.

REGISTRATION: PROSPERO (CRD42022291113).

RevDate: 2023-07-18
CmpDate: 2023-07-14

Stoehr A, Souganidis C, Thomas TB, et al (2023)

Voice onset time and vowel formant measures in online testing and laboratory-based testing with(out) surgical face masks.

The Journal of the Acoustical Society of America, 154(1):152-166.

Since the COVID-19 pandemic started, conducting experiments online is increasingly common, and face masks are often used in everyday life. It remains unclear whether phonetic detail in speech production is captured adequately when speech is recorded in internet-based experiments or in experiments conducted with face masks. We tested 55 Spanish-Basque-English trilinguals in picture naming tasks in three conditions: online, laboratory-based with surgical face masks, and laboratory-based without face masks (control). We measured plosive voice onset time (VOT) in each language, the formants and duration of English vowels /iː/ and /ɪ/, and the Spanish/Basque vowel space. Across conditions, there were differences between English and Spanish/Basque VOT and in formants and duration between English /iː/-/ɪ/; between conditions, small differences emerged. Relative to the control condition, the Spanish/Basque vowel space was larger in online testing and smaller in the face mask condition. We conclude that testing online or with face masks is suitable for investigating phonetic detail in within-participant designs although the precise measurements may differ from those in traditional laboratory-based research.

RevDate: 2023-07-18
CmpDate: 2023-07-13

Kries J, De Clercq P, Lemmens R, et al (2023)

Acoustic and phonemic processing are impaired in individuals with aphasia.

Scientific reports, 13(1):11208.

Acoustic and phonemic processing are understudied in aphasia, a language disorder that can affect different levels and modalities of language processing. For successful speech comprehension, processing of the speech envelope is necessary, which relates to amplitude changes over time (e.g., the rise times). Moreover, to identify speech sounds (i.e., phonemes), efficient processing of spectro-temporal changes as reflected in formant transitions is essential. Given the underrepresentation of aphasia studies on these aspects, we tested rise time processing and phoneme identification in 29 individuals with post-stroke aphasia and 23 healthy age-matched controls. We found significantly lower performance in the aphasia group than in the control group on both tasks, even when controlling for individual differences in hearing levels and cognitive functioning. Further, by conducting an individual deviance analysis, we found a low-level acoustic or phonemic processing impairment in 76% of individuals with aphasia. Additionally, we investigated whether this impairment would propagate to higher-level language processing and found that rise time processing predicts phonological processing performance in individuals with aphasia. These findings show that it is important to develop diagnostic and treatment tools that target low-level language processing mechanisms.

RevDate: 2023-07-10

Maes P, Weyland M, M Kissine (2023)

Structure and acoustics of the speech of verbal autistic preschoolers.

Journal of child language pii:S0305000923000417 [Epub ahead of print].

In this study, we report an extensive investigation of the structural language and acoustical specificities of the spontaneous speech of ten three- to five-year-old verbal autistic children. The autistic children were compared to a group of ten typically developing children matched pairwise on chronological age, nonverbal IQ and socioeconomic status, and groupwise on verbal IQ and gender on various measures of structural language (phonetic inventory, lexical diversity and morpho-syntactic complexity) and a series of acoustical measures of speech (mean and range fundamental frequency, a formant dispersion index, syllable duration, jitter and shimmer). Results showed that, overall, the structure and acoustics of the verbal autistic children's speech were highly similar to those of the TD children. Few remaining atypicalities in the speech of autistic children lay in a restricted use of different vocabulary items, a somewhat diminished morpho-syntactic complexity, and a slightly exaggerated syllable duration.

RevDate: 2023-08-02
CmpDate: 2023-07-10

Park EJ, SD Yoo (2023)

Correlation between the parameters of quadrilateral vowel and dysphonia severity in patients with traumatic brain injury.

Medicine, 102(27):e33030.

Dysarthria and dysphonia are common in patients with traumatic brain injury (TBI). Multiple factors may contribute to TBI-induced dysarthria, including poor vocalization, articulation, respiration, and/or resonance. Many patients suffer from dysarthria that persists after the onset of TBI, with negative effects on their quality of life. This study aimed to investigate the relationship between vowel quadrilateral parameters and Dysphoria Severity Index (DSI), which objectively reflects vocal function We retrospectively enrolled TBI patients diagnosed using computer tomography. Participants had dysarthria and dysphonia and underwent acoustic analysis. Praat software was used to measure vowel space area (VSA), formant centralization ratio (FCR), and the second formant (F2) ratio. For the 4 corner vowels (/a/,/u/,/i/, and/ae/), the resonance frequency of the vocal folds was measured and is shown as 2-dimensional coordinates for the formant parameters. Pear-son correlation and multiple linear regression analyses were performed between the variables. VSA showed a significant positive correlation with DSI/a/ (R = 0.221) and DSI/i/ (R = 0.026). FCR showed a significant negative correlation with DSI/u/ and DSI/i/. The F2 ratio showed a significant positive correlation with DSI/u/ and DSI/ae/. In the multiple linear regression analysis, VSA was found to be a significant predictor of DSI/a/ (β = 0.221, P = .030, R 2 = 0.139). F2 ratio (β = 0.275, P = .0.015) and FCR (β = -0.218, P = .029) was a significant predictor of DSI/u/ (R 2 = 0.203). FCR was a significant predictor of DSI/i/ (β = -0.260, P = .010, R 2 = 0.158). F2 ratio was a significant predictor of DSI/ae/ (β = 0.254, P = .013, R 2 = 0.154). Vowel quadrilateral parameters, such as VSA, FCR, and F2 ratio, may be associated with dysphonia severity in TBI patients.

RevDate: 2023-07-18

Persson A, TF Jaeger (2023)

Evaluating normalization accounts against the dense vowel space of Central Swedish.

Frontiers in psychology, 14:1165742.

Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.

RevDate: 2023-07-18
CmpDate: 2023-07-10

Steinschneider M (2023)

Toward an understanding of vowel encoding in the human auditory cortex.

Neuron, 111(13):1995-1997.

In this issue of Neuron, Oganian et al.[1] performed intracranial recordings in the auditory cortex of human subjects to clarify how vowels are encoded by the brain. Formant-based tuning curves demonstrated the organization of vowel encoding. The need for population codes and demonstration of speaker normalization were emphasized.

RevDate: 2023-07-18

Hong Y, Chen S, Zhou F, et al (2023)

Phonetic entrainment in L2 human-robot interaction: an investigation of children with and without autism spectrum disorder.

Frontiers in psychology, 14:1128976.

Phonetic entrainment is a phenomenon in which people adjust their phonetic features to approach those of their conversation partner. Individuals with Autism Spectrum Disorder (ASD) have been reported to show some deficits in entrainment during their interactions with human interlocutors, though deficits in terms of significant differences from typically developing (TD) controls were not always registered. One reason related to the inconsistencies of whether deficits are detected or not in autistic individuals is that the conversation partner's speech could hardly be controlled, and both the participants and the partners might be adjusting their phonetic features. The variabilities in the speech of conversation partners and various social traits exhibited might make the phonetic entrainment (if any) of the participants less detectable. In this study, we attempted to reduce the variability of the interlocutors by employing a social robot and having it do a goal-directed conversation task with children with and without ASD. Fourteen autistic children and 12 TD children participated the current study in their second language English. Results showed that autistic children showed comparable vowel formants and mean fundamental frequency (f0) entrainment as their TD peers, but they did not entrain their f0 range as the TD group did. These findings suggest that autistic children were capable of exhibiting phonetic entrainment behaviors similar to TD children in vowel formants and f0, particularly in a less complex situation where the speech features and social traits of the interlocutor were controlled. Furthermore, the utilization of a social robot may have increased the interest of these children in phonetic entrainment. On the other hand, entrainment of f0 range was more challenging for these autistic children even in a more controlled situation. This study demonstrates the viability and potential of using human-robot interactions as a novel method to evaluate abilities and deficits in phonetic entrainment in autistic children.

RevDate: 2023-07-04

Terranova F, Baciadonna L, Maccarone C, et al (2023)

Penguins perceive variations of source- and filter-related vocal parameters of species-specific vocalisations.

Animal cognition [Epub ahead of print].

Animal vocalisations encode a wide range of biological information about the age, sex, body size, and social status of the emitter. Moreover, vocalisations play a significant role in signalling the identity of the emitter to conspecifics. Recent studies have shown that, in the African penguin (Spheniscus demersus), acoustic cues to individual identity are encoded in the fundamental frequency (F0) and resonance frequencies (formants) of the vocal tract. However, although penguins are known to produce vocalisations where F0 and formants vary among individuals, it remains to be tested whether the receivers can perceive and use such information in the individual recognition process. In this study, using the Habituation-Dishabituation (HD) paradigm, we tested the hypothesis that penguins perceive and respond to a shift of ± 20% (corresponding to the natural inter-individual variation observed in ex-situ colonies) of F0 and formant dispersion (ΔF) of species-specific calls. We found that penguins were more likely to look rapidly and for longer at the source of the sound when F0 and formants of the calls were manipulated, indicating that they could perceive variations of these parameters in the vocal signals. Our findings provide the first experimental evidence that, in the African penguin, listeners can perceive changes in F0 and formants, which can be used by the receiver as potential cues for the individual discrimination of the emitter.

RevDate: 2023-06-30

Panneton R, Cristia A, Taylor C, et al (2023)

Positive Valence Contributes to Hyperarticulation in Maternal Speech to Infants and Puppies.

Journal of child language pii:S0305000923000296 [Epub ahead of print].

Infant-directed speech often has hyperarticulated features, such as point vowels whose formants are further apart than in adult-directed speech. This increased "vowel space" may reflect the caretaker's effort to speak more clearly to infants, thus benefiting language processing. However, hyperarticulation may also result from more positive valence (e.g., speaking with positive vocal emotion) often found in mothers' speech to infants. This study was designed to replicate others who have found hyperarticulation in maternal speech to their 6-month-olds, but also to examine their speech to a non-human infant (i.e., a puppy). We rated both kinds of maternal speech for their emotional valence and recorded mothers' speech to a human adult. We found that mothers produced more positively valenced utterances and some hyperarticulation in both their infant- and puppy-directed speech, compared to their adult-directed speech. This finding promotes looking at maternal speech from a multi-faceted perspective that includes emotional state.

RevDate: 2023-07-18
CmpDate: 2023-07-17

Vogt C, Floegel M, Kasper J, et al (2023)

Oxytocinergic modulation of speech production-a double-blind placebo-controlled fMRI study.

Social cognitive and affective neuroscience, 18(1):.

Many socio-affective behaviors, such as speech, are modulated by oxytocin. While oxytocin modulates speech perception, it is not known whether it also affects speech production. Here, we investigated effects of oxytocin administration and interactions with the functional rs53576 oxytocin receptor (OXTR) polymorphism on produced speech and its underlying brain activity. During functional magnetic resonance imaging, 52 healthy male participants read sentences out loud with either neutral or happy intonation, a covert reading condition served as a common baseline. Participants were studied once under the influence of intranasal oxytocin and in another session under placebo. Oxytocin administration increased the second formant of produced vowels. This acoustic feature has previously been associated with speech valence; however, the acoustic differences were not perceptually distinguishable in our experimental setting. When preparing to speak, oxytocin enhanced brain activity in sensorimotor cortices and regions of both dorsal and right ventral speech processing streams, as well as subcortical and cortical limbic and executive control regions. In some of these regions, the rs53576 OXTR polymorphism modulated oxytocin administration-related brain activity. Oxytocin also gated cortical-basal ganglia circuits involved in the generation of happy prosody. Our findings suggest that several neural processes underlying speech production are modulated by oxytocin, including control of not only affective intonation but also sensorimotor aspects during emotionally neutral speech.

RevDate: 2023-06-21

Vasquez-Serrano P, Reyes-Moreno J, Guido RC, et al (2023)

MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00163-7 [Epub ahead of print].

On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers' data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.

RevDate: 2023-06-12

Cox C, Dideriksen C, Keren-Portnoy T, et al (2023)

Infant-directed speech does not always involve exaggerated vowel distinctions: Evidence from Danish.

Child development [Epub ahead of print].

This study compared the acoustic properties of 26 (100% female, 100% monolingual) Danish caregivers' spontaneous speech addressed to their 11- to 24-month-old infants (infant-directed speech, IDS) and an adult experimenter (adult-directed speech, ADS). The data were collected between 2016 and 2018 in Aarhus, Denmark. Prosodic properties of Danish IDS conformed to cross-linguistic patterns, with a higher pitch, greater pitch variability, and slower articulation rate than ADS. However, an acoustic analysis of vocalic properties revealed that Danish IDS had a reduced or similar vowel space, higher within-vowel variability, raised formants, and lower degree of vowel discriminability compared to ADS. None of the measures, except articulation rate, showed age-related differences. These results push for future research to conduct theory-driven comparisons across languages with distinct phonological systems.

RevDate: 2023-08-04

Philosophical Transactions B Editorial team (2023)

Editor's note: Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 378(1882):20230201.

RevDate: 2023-06-13

Baron A, Harwood V, Kleinman D, et al (2023)

Where on the face do we look during phonemic restoration: An eye-tracking study.

Frontiers in psychology, 14:1005186.

Face to face communication typically involves audio and visual components to the speech signal. To examine the effect of task demands on gaze patterns in response to a speaking face, adults participated in two eye-tracking experiments with an audiovisual (articulatory information from the mouth was visible) and a pixelated condition (articulatory information was not visible). Further, task demands were manipulated by having listeners respond in a passive (no response) or an active (button press response) context. The active experiment required participants to discriminate between speech stimuli and was designed to mimic environmental situations which require one to use visual information to disambiguate the speaker's message, simulating different listening conditions in real-world settings. Stimuli included a clear exemplar of the syllable /ba/ and a second exemplar in which the formant initial consonant was reduced creating an /a/-like consonant. Consistent with our hypothesis, results revealed that the greatest fixations to the mouth were present in the audiovisual active experiment and visual articulatory information led to a phonemic restoration effect for the /a/ speech token. In the pixelated condition, participants fixated on the eyes, and discrimination of the deviant token within the active experiment was significantly greater than the audiovisual condition. These results suggest that when required to disambiguate changes in speech, adults may look to the mouth for additional cues to support processing when it is available.

RevDate: 2023-06-11

Ikuma T, McWhorter AJ, Oral E, et al (2023)

Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00154-6 [Epub ahead of print].

OBJECTIVES: This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects.

METHODS: Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested.

RESULTS: Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections.

CONCLUSIONS: Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.

RevDate: 2023-06-05

Ashokumar M, Guichet C, Schwartz JL, et al (2023)

Correlation between the effect of orofacial somatosensory inputs in speech perception and speech production performance.

Auditory perception & cognition, 6(1-2):97-107.

INTRODUCTION: Orofacial somatosensory inputs modify the perception of speech sounds. Such auditory-somatosensory integration likely develops alongside speech production acquisition. We examined whether the somatosensory effect in speech perception varies depending on individual characteristics of speech production.

METHODS: The somatosensory effect in speech perception was assessed by changes in category boundary between /e/ and /ø/ in a vowel identification test resulting from somatosensory stimulation providing facial skin deformation in the rearward direction corresponding to articulatory movement for /e/ applied together with the auditory input. Speech production performance was quantified by the acoustic distances between the average first, second and third formants of /e/ and /ø/ utterances recorded in a separate test.

RESULTS: The category boundary between /e/ and /ø/ was significantly shifted towards /ø/ due to the somatosensory stimulation which is consistent with previous research. The amplitude of the category boundary shift was significantly correlated with the acoustic distance between the mean second - and marginally third - formants of /e/ and /ø/ productions, with no correlation with the first formant distance.

DISCUSSION: Greater acoustic distances can be related to larger contrasts between the articulatory targets of vowels in speech production. These results suggest that the somatosensory effect in speech perception can be linked to speech production performance.

RevDate: 2023-06-01
CmpDate: 2023-05-29

Saba JN, Ali H, JHL Hansen (2023)

The effects of estimation accuracy, estimation approach, and number of selected channels using formant-priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

The Journal of the Acoustical Society of America, 153(5):3100.

Previously, selection of l channels was prioritized according to formant frequency locations in an l-of-n-of-m-based signal processing strategy to provide important voicing information independent of listening environments for cochlear implant (CI) users. In this study, ideal, or ground truth, formants were incorporated into the selection stage to determine the effect of accuracy on (1) subjective speech intelligibility, (2) objective channel selection patterns, and (3) objective stimulation patterns (current). An average +11% improvement (p < 0.05) was observed across six CI users in quiet, but not for noise or reverberation conditions. Analogous increases in channel selection and current for the upper range of F1 and a decrease across mid-frequencies with higher corresponding current, were both observed at the expense of noise-dominant channels. Objective channel selection patterns were analyzed a second time to determine the effects of estimation approach and number of selected channels (n). A significant effect of estimation approach was only observed in the noise and reverberation condition with minor differences in channel selection and significantly decreased stimulated current. Results suggest that estimation method, accuracy, and number of channels in the proposed strategy using ideal formants may improve intelligibility when corresponding stimulated current of formant channels are not masked by noise-dominant channels.

RevDate: 2023-07-11
CmpDate: 2023-06-19

Carney LH, Cameron DA, Kinast KB, et al (2023)

Effects of sensorineural hearing loss on formant-frequency discrimination: Measurements and models.

Hearing research, 435:108788.

This study concerns the effect of hearing loss on discrimination of formant frequencies in vowels. In the response of the healthy ear to a harmonic sound, auditory-nerve (AN) rate functions fluctuate at the fundamental frequency, F0. Responses of inner-hair-cells (IHCs) tuned near spectral peaks are captured (or dominated) by a single harmonic, resulting in lower fluctuation depths than responses of IHCs tuned between spectral peaks. Therefore, the depth of neural fluctuations (NFs) varies along the tonotopic axis and encodes spectral peaks, including formant frequencies of vowels. This NF code is robust across a wide range of sound levels and in background noise. The NF profile is converted into a rate-place representation in the auditory midbrain, wherein neurons are sensitive to low-frequency fluctuations. The NF code is vulnerable to sensorineural hearing loss (SNHL) because capture depends upon saturation of IHCs, and thus the interaction of cochlear gain with IHC transduction. In this study, formant-frequency discrimination limens (DLFFs) were estimated for listeners with normal hearing or mild to moderate SNHL. The F0 was fixed at 100 Hz, and formant peaks were either aligned with harmonic frequencies or placed between harmonics. Formant peak frequencies were 600 and 2000 Hz, in the range of first and second formants of several vowels. The difficulty of the task was varied by changing formant bandwidth to modulate the contrast in the NF profile. Results were compared to predictions from model auditory-nerve and inferior colliculus (IC) neurons, with listeners' audiograms used to individualize the AN model. Correlations between DLFFs, audiometric thresholds near the formant frequencies, age, and scores on the Quick speech-in-noise test are reported. SNHL had a strong effect on DLFF for the second formant frequency (F2), but relatively small effect on DLFF for the first formant (F1). The IC model appropriately predicted substantial threshold elevations for changes in F2 as a function of SNHL and little effect of SNHL on thresholds for changes in F1.

RevDate: 2023-06-02

Rizzi R, GM Bidelman (2023)

Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech.

bioRxiv : the preprint server for biology.

So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- vs. high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" vs. "ga". The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.

RevDate: 2023-06-07
CmpDate: 2023-05-23

Cox SR, Huang T, Chen WR, et al (2023)

An acoustic study of Cantonese alaryngeal speech in different speaking conditions.

The Journal of the Acoustical Society of America, 153(5):2973.

Esophageal (ES) speech, tracheoesophageal (TE) speech, and the electrolarynx (EL) are common methods of communication following the removal of the larynx. Our recent study demonstrated that intelligibility may increase for Cantonese alaryngeal speakers using clear speech (CS) compared to their everyday "habitual speech" (HS), but the reasoning is still unclear [Hui, Cox, Huang, Chen, and Ng (2022). Folia Phoniatr. Logop. 74, 103-111]. The purpose of this study was to assess the acoustic characteristics of vowels and tones produced by Cantonese alaryngeal speakers using HS and CS. Thirty-one alaryngeal speakers (9 EL, 10 ES, and 12 TE speakers) read The North Wind and the Sun passage in HS and CS. Vowel formants, vowel space area (VSA), speaking rate, pitch, and intensity were examined, and their relationship to intelligibility were evaluated. Statistical models suggest that larger VSAs significantly improved intelligibility, but slower speaking rate did not. Vowel and tonal contrasts did not differ between HS and CS for all three groups, but the amount of information encoded in fundamental frequency and intensity differences between high and low tones positively correlated with intelligibility for TE and ES groups, respectively. Continued research is needed to understand the effects of different speaking conditions toward improving acoustic and perceptual characteristics of Cantonese alaryngeal speech.

RevDate: 2023-06-19
CmpDate: 2023-06-19

Valls-Ontañón A, Ferreiro M, Moragues-Aguiló B, et al (2023)

Impact of 3-dimensional anatomical changes secondary to orthognathic surgery on voice resonance and articulatory function: a prospective study.

The British journal of oral & maxillofacial surgery, 61(5):373-379.

An evaluation was made of the impact of orthognathic surgery (OS) on speech, addressing in particular the effects of skeletal and airway changes on voice resonance characteristics and articulatory function. A prospective study was carried out involving 29 consecutive patientssubjected to OS. Preoperative, and short and long-term postoperative evaluations were made of anatomical changes (skeletal and airway measurements), speech evolution (assessed objectively by acoustic analysis: fundamental frequency, local jitter, local shimmer of each vowel, and formants F1 and F2 of vowel /a/), and articulatory function (use of compensatory musculature, point of articulation, and speech intelligibility). These were also assessed subjectively by means of a visual analogue scale. Articulatory function after OS showed immediate improvement and had further progressed at one year of follow up. This improvement significantly correlated with the anatomical changes, and was also notably perceived by the patient. On the other hand, although a slight modification in vocal resonance was reported and seen to correlate with anatomical changes of the tongue, hyoid bone, and airway, it was not subjectively perceived by the patients. In conclusion, the results demonstrated that OS had beneficial effects on articulatory function and imperceptible subjective changes in a patient's voice. Patients subjected to OS, apart from benefitting from improved articulatory function, should not be afraid that they will not recognise their voice after treatment.

RevDate: 2023-05-21

Shellikeri S, Cho S, Ash S, et al (2023)

Digital markers of motor speech impairments in natural speech of patients with ALS-FTD spectrum disorders.

medRxiv : the preprint server for health sciences pii:2023.04.29.23289308.

BACKGROUND AND OBJECTIVES: Patients with ALS-FTD spectrum disorders (ALS-FTSD) have mixed motor and cognitive impairments and require valid and quantitative assessment tools to support diagnosis and tracking of bulbar motor disease. This study aimed to validate a novel automated digital speech tool that analyzes vowel acoustics from natural, connected speech as a marker for impaired articulation due to bulbar motor disease in ALS-FTSD.

METHODS: We used an automatic algorithm called Forced Alignment Vowel Extraction (FAVE) to detect spoken vowels and extract vowel acoustics from 1 minute audio-recorded picture descriptions. Using automated acoustic analysis scripts, we derived two articulatory-acoustic measures: vowel space area (VSA, in Bark [2]) which represents tongue range-of-motion (size), and average second formant slope of vowel trajectories (F2 slope) which represents tongue movement speed. We compared vowel measures between ALS with and without clinically-evident bulbar motor disease (ALS+bulbar vs. ALS-bulbar), behavioral variant frontotemporal dementia (bvFTD) without a motor syndrome, and healthy controls (HC). We correlated impaired vowel measures with bulbar disease severity, estimated by clinical bulbar scores and perceived listener effort, and with MRI cortical thickness of the orobuccal part of the primary motor cortex innervating the tongue (oralPMC). We also tested correlations with respiratory capacity and cognitive impairment.

RESULTS: Participants were 45 ALS+bulbar (30 males, mean age=61±11), 22 ALS-nonbulbar (11 males, age=62±10), 22 bvFTD (13 males, age=63±7), and 34 HC (14 males, age=69±8). ALS+bulbar had smaller VSA and shallower average F2 slopes than ALS-bulbar (VSA: | d |=0.86, p =0.0088; F2 slope: | d |=0.98, p =0.0054), bvFTD (VSA: | d |=0.67, p =0.043; F2 slope: | d |=1.4, p <0.001), and HC (VSA: | d |=0.73, p =0.024; F2 slope: | d |=1.0, p <0.001). Vowel measures declined with worsening bulbar clinical scores (VSA: R=0.33, p =0.033; F2 slope: R=0.25, p =0.048), and smaller VSA was associated with greater listener effort (R=-0.43, p =0.041). Shallower F2 slopes were related to cortical thinning in oralPMC (R=0.50, p =0.03). Neither vowel measure was associated with respiratory nor cognitive test scores.

CONCLUSIONS: Vowel measures extracted with automatic processing from natural speech are sensitive to bulbar motor disease in ALS-FTD and are robust to cognitive impairment.

RevDate: 2023-07-21
CmpDate: 2023-07-21

Easwar V, Peng ZE, Mak V, et al (2023)

Differences between children and adults in the neural encoding of voice fundamental frequency in the presence of noise and reverberation.

The European journal of neuroscience, 58(2):2547-2562.

Environmental noise and reverberation challenge speech understanding more significantly in children than in adults. However, the neural/sensory basis for the difference is poorly understood. We evaluated the impact of noise and reverberation on the neural processing of the fundamental frequency of voice (f0)-an important cue to tag or recognize a speaker. In a group of 39 6- to 15-year-old children and 26 adults with normal hearing, envelope following responses (EFRs) were elicited by a male-spoken /i/ in quiet, noise, reverberation, and both noise and reverberation. Due to increased resolvability of harmonics at lower than higher vowel formants that may affect susceptibility to noise and/or reverberation, the /i/ was modified to elicit two EFRs: one initiated by the low frequency first formant (F1) and the other initiated by mid to high frequency second and higher formants (F2+) with predominantly resolved and unresolved harmonics, respectively. F1 EFRs were more susceptible to noise whereas F2+ EFRs were more susceptible to reverberation. Reverberation resulted in greater attenuation of F1 EFRs in adults than children, and greater attenuation of F2+ EFRs in older than younger children. Reduced modulation depth caused by reverberation and noise explained changes in F2+ EFRs but was not the primary determinant for F1 EFRs. Experimental data paralleled modelled EFRs, especially for F1. Together, data suggest that noise or reverberation influences the robustness of f0 encoding depending on the resolvability of vowel harmonics and that maturation of processing temporal/envelope information of voice is delayed in reverberation, particularly for low frequency stimuli.

RevDate: 2023-05-12

Wang Y, Hattori M, Masaki K, et al (2023)

Detailed speech evaluation including formant 3 analysis and voice visualization in maxillofacial rehabilitation: A clinical report.

The Journal of prosthetic dentistry pii:S0022-3913(23)00221-4 [Epub ahead of print].

Objective speech evaluation such as analysis of formants 1 and 2 and nasality measurement have been used in maxillofacial rehabilitation for outcome assessment. However, in some patients, those evaluations are insufficient to assess a specific or unique problem. This report describes the use of a new speech evaluation including formant 3 analysis and voice visualization in a patient with a maxillofacial defect. The patient was a 67-year-old man who had a maxillary defect that opened to the maxillary sinus and who had an unnatural voice even when wearing an obturator. Nasality was low and the frequency of formants 1 and 2 were normal even without the obturator. However, a low frequency of formant 3 and a shifted center of voice were observed. These results indicated that the unnatural voice was related to increased resonant volume in the pharynx rather than hypernasality. This patient demonstrates that advanced speech analysis can be useful for detecting the cause of speech disorder and planning maxillofacial rehabilitation.

RevDate: 2023-05-05

Cavalcanti JC, Eriksson A, PA Barbosa (2023)

On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style.

Frontiers in psychology, 14:1101187.

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetry concerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

RevDate: 2023-05-17
CmpDate: 2023-05-04

Zaltz Y (2023)

The effect of stimulus type and testing method on talker discrimination of school-age children.

The Journal of the Acoustical Society of America, 153(5):2611.

Efficient talker discrimination (TD) improves speech understanding under multi-talker conditions. So far, TD of children has been assessed using various testing parameters, making it difficult to draw comparative conclusions. This study explored the effects of the stimulus type and variability on children's TD. Thirty-two children (7-10 years old) underwent eight TD assessments with fundamental frequency + formant changes using an adaptive procedure. Stimuli included consonant-vowel-consonant words or three-word sentences and were either fixed by run or by trial (changing throughout the run). Cognitive skills were also assessed. Thirty-one adults (18-35 years old) served as controls. The results showed (1) poorer TD for the fixed-by-trial than the fixed-by-run method, with both stimulus types for the adults but only with the words for the children; (2) poorer TD for the words than the sentences with the fixed-by-trial method only for the children; and (3) significant correlations between the children's age and TD. These results support a developmental trajectory in the use of perceptual anchoring for TD and in its reliance on comprehensive acoustic and linguistic information. The finding that the testing parameters may influence the top-down and bottom-up processing for TD should be considered when comparing data across studies or when planning new TD experiments.

LOAD NEXT 100 CITATIONS

RJR Experience and Expertise

Researcher

Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.

Administrator

Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.

Technologist

Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.

Facilitator

Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226

206-300-3443

E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )