picture
RJR-logo

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

icon

Bibliography Options Menu

icon
QUERY RUN:
22 Oct 2024 at 01:33
HITS:
3825
PAGE OPTIONS:
Hide Abstracts   |   Hide Additional Links
NOTE:
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

RJR-3x

Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 22 Oct 2024 at 01:33 Created: 

Pangenome

Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: ( pangenome OR "pan-genome" OR "pan genome" ) NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

-->

RevDate: 2024-10-21

Garrison E, Guarracino A, Heumos S, et al (2024)

Building pangenome graphs.

Nature methods [Epub ahead of print].

Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships.

RevDate: 2024-10-21

Chikhi R, Dufresne Y, P Medvedev (2024)

Constructing and personalizing population pangenome graphs.

Nature methods [Epub ahead of print].

RevDate: 2024-10-21

Ford MKB, Hari A, Zhou Q, et al (2024)

Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool.

Bioinformatics (Oxford, England) pii:7829146 [Epub ahead of print].

SUMMARY: Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.

BAKIR is available at github.com/algo-cancer/bakir.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2024-10-21

van Dam L, Cruz-Morales P, Rodriguez Valerón N, et al (2024)

GastronOmics: Edibility and safety of mycelium of the oyster mushroom Pleurotus ostreatus.

Current research in food science, 9:100866.

Food production is one of the most environmentally damaging human activities. In the face of climate change, it is essential to rethink our dietary habits and explore potential alternative foods catering both towards human and planetary needs. Fungal mycelium might be an attractive alternative protein source due to its rapid growth on sustainable substrates as well as promising nutritional and organoleptic properties. The natural biodiversity of filamentous fungi is vast and represents an untapped reservoir for food innovation. However, fungi are known to produce bioactive compounds that may affect human health, both positively and negatively. To narrow the search for safe and culinarily attractive fungal species, mycelia of edible fruiting-body forming fungi provide a promising starting point. Here, we explore whether the culinary attractiveness and safety of the commonly eaten mushroom, Pleurotus ostreatus, can also be translated to its mycelium. Whole-genome sequencing and pan-genome analysis revealed a high degree of genetic variability within the genus Pleurotus, suggesting that gastronomic traits as well as food safety may differ between strains. A representative strain, P. ostreatus M2191, was further analyzed for the food safety, nutritional properties and culinary applicability of its mycelium. No regulated mycotoxins were detected in either the fruiting body nor the mycelium. Yet, P. ostreatus is known to produce four peptide toxins, Ostreatin, Ostreolysin and Pleurotoysin A/B. These were found to be lower in the mycelium compared to fruiting bodies, which are already considered safe for consumption. Instead, a number of secondary metabolites with potential health benefits were detected in the fungal mycelium. In silico analysis of the proteome suggested low allergenicity. In addition, the fruiting body and the mycelium showed similar nutritional value, which was dependent on the growth substrate. To highlight the culinary potential of mycelium, we created a dish served at the two-star restaurant the Alchemist in Copenhagen, Denmark. Sensory analysis of the mycelium dish by an untrained consumer panel indicated consumer liking and openness to fungal mycelia. Based on sustainability, safety, culinary potential, and consumer acceptance, our findings suggest that P. ostreatus mycelium has great potential for use as a novel food source.

RevDate: 2024-10-20
CmpDate: 2024-10-20

Soares R, Fonseca BM, Nash BW, et al (2024)

A survey of the Desulfuromonadia "cytochromome" provides a glimpse of the unexplored diversity of multiheme cytochromes in nature.

BMC genomics, 25(1):982.

BACKGROUND: Multiheme cytochromes c (MHC) provide prokaryotes with a broad metabolic versatility that contributes to their role in the biogeochemical cycling of the elements and in energy production in bioelectrochemical systems. However, MHC have only been isolated and studied in detail from a limited number of species. Among these, Desulfuromonadia spp. are particularly MHC-rich. To obtain a broad view of the diversity of MHC, we employed bioinformatic tools to study the cytochromome encoded in the genomes of the Desulfuromonadia class.

RESULTS: We found that the distribution of the MHC families follows a different pattern between the two orders of the Desulfuromonadia class and that there is great diversity in the number of heme-binding motifs in MHC. However, the vast majority of MHC have up to 12 heme-binding motifs. MHC predicted to be extracellular are the least conserved and show high diversity, whereas inner membrane MHC are well conserved and show lower diversity. Although the most prevalent MHC have homologues already characterized, nearly half of the MHC families in the Desulforomonadia class have no known characterized homologues. AlphaFold2 was employed to predict their 3D structures. This provides an atlas of novel MHC, including examples with high beta-sheet content and nanowire MHC with unprecedented high numbers of putative heme cofactors per polypeptide.

CONCLUSIONS: This work illuminates for the first time the universe of experimentally uncharacterized cytochromes that are likely to contribute to the metabolic versatility and to the fitness of Desulfuromonadia in diverse environmental conditions and to drive biotechnological applications of these organisms.

RevDate: 2024-10-21

Gluck-Thaler E, Forsythe A, Puerner C, et al (2024)

Giant transposons promote strain heterogeneity in a major fungal pathogen.

bioRxiv : the preprint server for biology pii:2024.06.28.601215.

UNLABELLED: Fungal infections are difficult to prevent and treat in large part due to strain heterogeneity. However, the genetic mechanisms driving pathogen variation remain poorly understood. Here, we determined the extent to which Starships -giant transposons capable of mobilizing numerous fungal genes-generate genetic and phenotypic variability in the human pathogen Aspergillus fumigatus . We analyzed 519 diverse strains, including 12 newly sequenced with long-read technology, to reveal 20 distinct Starships that are generating genomic heterogeneity over timescales potentially relevant for experimental reproducibility. Starship -mobilized genes encode diverse functions, including biofilm-related virulence factors and biosynthetic gene clusters, and many are differentially expressed during infection and antifungal exposure in a strain-specific manner. These findings support a new model of fungal evolution wherein Starships help generate variation in gene content and expression among fungal strains. Together, our results demonstrate that Starships are a previously hidden mechanism generating genotypic and, in turn, phenotypic heterogeneity in a major human fungal pathogen.

IMPORTANCE: No "one size fits all" option exists for treating fungal infections in large part due to genetic and phenotypic variation among strains. Accounting for strain heterogeneity is thus fundamental for developing efficacious treatments and strategies for safeguarding human health. Here, we report significant progress towards achieving this goal by uncovering a previously hidden mechanism generating heterogeneity in the major human fungal pathogen Aspergillus fumigatus : giant transposons called Starships that span dozens of kilobases and mobilize fungal genes as cargo. By conducting the first systematic investigation of these unusual transposons in a single fungal species, we demonstrate their contributions to population-level variation at the genome, pangenome and transcriptome levels. The Starship atlas we developed will not only help account for variation introduced by these elements in laboratory experiments but will serve as a foundational resource for determining how Starships shape clinically-relevant phenotypes, such as antifungal resistance and pathogenicity.

RevDate: 2024-10-18

Munim MA, Tanni AA, Hossain MM, et al (2024)

Whole genome sequencing of multidrug-resistant Klebsiella pneumoniae from poultry in Noakhali, Bangladesh: Assessing risk of transmission to humans in a pilot study.

Comparative immunology, microbiology and infectious diseases, 114:102246 pii:S0147-9571(24)00123-1 [Epub ahead of print].

BACKGROUND: Multi-drug resistant (MDR) Klebsiella pneumoniae is a public health concern due to its presence in Bangladeshi poultry products and its ability to spread resistance genes. This study genetically characterizes a distinct MDR K. pneumoniae isolate from the gut of poultry in Noakhali, Bangladesh, offering insights into its resistance mechanisms and public health impact.

METHODS: Klebsiella pneumoniae isolates from broiler and layer poultry were identified using biochemical and molecular analyses. Eleven isolates were tested for antibiotic sensitivity and categorized by their Multiple Antibiotic Resistance Index (MARI) profiles. The isolate with the highest MARI was selected for whole-genome sequencing using Illumina technology. The sequencing data were analyzed for genome annotation, pan-genome analysis, genome similarities, sequence type identification, and the identification of genetic determinants of resistance and virulence genes.

RESULT: We identified 10 MARI profiles among 11 K. pneumoniae isolates, with values ranging from 0.64 to 0.94. The highest MARI of 0.94 was found in an isolate from a layer poultry. This isolate's genome, 5401,789 base pairs long with 89.6 % coverage, showed potential inter-species dissemination, as indicated by core genome phylogenetic analysis. It possessed genes conferring resistance to fluoroquinolones, aminoglycosides, β-lactams, folate pathway antagonists, fosfomycin, macrolides, quinolones, rifamycin, tetracyclines, and polymyxins, including colistin.

CONCLUSION: Poultry serve as reservoirs for MDR K. pneumoniae, which can spread to other species and pose significant health risks. Rigorous monitoring of antibiotic use and genetic characterization of MDR bacterial isolates are essential to mitigate this threat.

RevDate: 2024-10-18

Gao S, Zhang Y, Bush SJ, et al (2024)

Centromere Landscapes Resolved from Hundreds of Human Genomes.

Genomics, proteomics & bioinformatics pii:7826621 [Epub ahead of print].

High-fidelity (HiFi) sequencing has facilitated the assembly and analysis of the most repetitive region of the genome, the centromere. Nevertheless, our current understanding of human centromeres is based on a relatively small number of telomere-to-telomere assemblies, which has not yet captured its full diversity. In this study, we investigated the genomic diversity of human centromere higher order repeats (HORs) via both HiFi reads and haplotype-resolved assemblies from hundreds of samples drawn from ongoing pangenome-sequencing projects and reprocessed them via a novel HOR annotation pipeline, HiCAT-human. We used this wealth of data to provide a global survey of the centromeric HOR landscape; in particular, we found that 23 HORs presented significant copy number variability between populations. We detected three centromere genotypes with unbalanced population frequencies on chromosomes 5, 8, and 17. An inter-assembly comparison of HOR loci further revealed that while HOR array structures are diverse, they nevertheless tend to form a number of specific landscapes, each exhibiting different levels of HOR subunit expansion and possibly reflecting a cyclical evolutionary transition from homogeneous to nested structures and back.

RevDate: 2024-10-18

Mueller KD, Panzetta ME, Davey L, et al (2024)

Pangenomic analysis identifies correlations between Akkermansia species and subspecies and human health outcomes.

Microbiome research reports, 3(3):33.

Aim: Akkermansia are common members of the human gastrointestinal microbiota. The prevalence of these mucophilic bacteria, especially Akkermansia muciniphila (A. muciniphila), correlates with immunological and metabolic health. The genus Akkermansia in humans includes species with significantly larger genomes than A. muciniphila, leading us to postulate that this added genetic content may influence how they impact human metabolic and immunological health. Methods: We conducted a pangenomic analysis of 234 Akkermansia complete or near-complete genomes. We also used high-resolution species and subspecies assignments to reanalyze publicly available metagenomic datasets to determine if there are relationships between Akkermansia species and A. muciniphila clades with various disease outcomes. Results: Analysis of genome-wide average nucleotide identity, 16S rRNA gene identity, conservation of core Akkermansia genes, and analysis of the fatty acid composition of representative isolates support the partitioning of the genus Akkermansia into several species. In addition, A. muciniphila sensu stricto, the most prevalent Akkermansia species in humans, should be subdivided into two subspecies. For a pediatric cohort, we observed species-specific correlations between Akkermansia abundance with baseline obesity or after various interventions. For inflammatory bowel disease cohorts, we identified a decreased abundance of Akkermansia in patients with ulcerative colitis or Crohn's disease, which was species and subspecies-dependent. In patients undergoing immune checkpoint inhibitor therapies for non-small cell lung carcinoma, we observed a significant association between one A. muciniphila subspecies and survival outcomes. Conclusion: Our findings suggest that the prevalence of specific Akkermansia species and/or subspecies can be crucial in evaluating their association with human health, particularly in different disease contexts, and is an important consideration for their use as probiotics.

RevDate: 2024-10-17

Chaudhari DN, Ahire JJ, Devkatte AN, et al (2024)

Complete Genome Sequence and Probiotic Characterization of Lactobacillus delbrueckii subsp. Indicus DC-3 Isolated from Traditional Indigenous Fermented Milk.

Probiotics and antimicrobial proteins [Epub ahead of print].

In this study, Lactobacillus delbrueckii subsp. indicus DC-3 was isolated from Indian traditional indigenous fermented milk Dahi and identified using whole genome sequencing. The safety of the strain was evaluated using genetic and phenotypic analyses, such as the presence of virulence factors, mobile and insertion elements, plasmids, antibiotic resistance, etc. Besides this, the strain was comprehensively investigated for in vitro probiotic traits, biofilm formation, antibacterials, and exopolysaccharide (EPS) production. In results, the strain showed a single circular chromosome (3,145,837 bp) with a GC content of 56.73%, a higher number of accessory and unique genes, an open pan-genome, and the absence of mobile and insertion elements, plasmids, virulence, and transmissible antibiotic resistance genes. The strain was capable of surviving in gastric juice (83% viability at 3 h) and intestinal juice (71% viability at 6 h) and showed 42.5% autoaggregation, adhesion to mucin, 8.7% adhesion to xylene, and 8.3% adhesion to Caco-2 cells. The γ-hemolytic nature, usual antibiotic susceptibility profile, and negative results for mucin and gelatin degradation ensure the safety of the strain. The strain produced 10.5 g/L of D-lactic acid and hydrogen peroxide, capable of inhibiting and co-aggregating Escherichia coli MTCC 1687, Proteus mirabilis MTCC 425, and Candida albicans ATCC 14,053. In addition, the strain showed 90 mg/L EPS (48 h) and biofilm formation. In conclusion, this study demonstrates that L. delbrueckii subsp. indicus DC-3 is unique and different than previously reported L. delbrueckii subsp. indicus strains and is a safe potential probiotic candidate.

RevDate: 2024-10-17

Wang L, Chen S, Xing M, et al (2024)

Genome characterization of Shewanella algae in Hainan Province, China.

Frontiers in microbiology, 15:1474871.

Shewanella algae is an emerging marine zoonotic pathogen. In this study, we first reported the Shewanella algae infections in patients and animals in Hainan Province, China. Currently, there is still relatively little known about the whole-genome characteristics of Shewanella algae in most tropical regions, including in southern China. Here, we sequenced the 62 Shewanella algae strains isolated from Hainan Province and combined with the whole genomes sequences of 144 Shewanella algae genomes from public databases to analyze genomic features. Phylogenetic analysis revealed that Shewanella algae is widely distributed in the marine environments of both temperate and tropical countries, exhibiting close phylogenetic relationships with genomes isolated from patients, animals, and plants. Thereby confirming that exposure to marine environments is a risk factor for Shewanella algae infections. Average nucleotide identity analysis indicated that the clonally identical genomes could be isolated from patients with different sample types at different times. Pan-genome analysis identified a total of 21,909 genes, including 1,563 core genes, 8,292 strain-specific genes, and 12,054 accessory genes. Multiple putative virulence-associated genes were identified, encompassing 14 categories and 16 subcategories, with 171 distinct virulence factors. Three different plasmid replicon types were detected in 33 genomes. Eleven classes of antibiotic resistance genes and 352 integrons were identified. Antimicrobial susceptibility testing revealed a high resistance rate to imipenem and colistin among the strains studied, with 5 strains exhibiting multidrug resistance. However, they were all sensitive to amikacin, minocycline, and tigecycline. Our findings clarify the genomic characteristics and population structure of Shewanella algae in Hainan Province. The results offer insights into the genetic basis of pathogenicity in Shewanella algae and enhance our understanding of its global phylogeography.

RevDate: 2024-10-17

Majernik SN, Beaver L, PH Bradley (2024)

Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.

bioRxiv : the preprint server for biology pii:2024.10.11.617902.

Individual genes from microbiomes can drive host-level phenotypes. To help identify such candidate genes, several recent tools estimate microbial gene copy numbers directly from metagenomes. These tools rely on alignments to pangenomes, which in turn are derived from the set of all individual genomes from one species. While large-scale metagenomic assembly efforts have made pangenome estimates more complete, mixed communities can also introduce contamination into assemblies, and it is unknown how robust pangenome-based metagenomic analyses are to these errors. To gain insight into this problem, we re-analyzed a case-control study of the gut microbiome in cirrhosis, focusing on commensal Clostridia previously implicated in this disease. We tested for differentially prevalent genes in the Lachnospiraceae , then investigated which were likely to be contaminants using sequence similarity searches. Out of 86 differentially prevalent genes, we found that 33 (38%) were probably contaminants originating in taxa such as Veillonella and Haemophilus , unrelated genera that were independently correlated with disease status. Our results demonstrate that even small amounts of contamination in metagenome assemblies, below typical quality thresholds, can threaten to overwhelm gene-level metagenomic analyses. However, we also show that such contaminants can be accurately identified using a method based on gene-to-species correlation. After removing these contaminants, we observe that several flagellar motility gene clusters in the Lachnospira eligens pangenome are associated with cirrhosis status. We have integrated our analyses into an analysis and visualization pipeline, PanSweep, that can automatically identify cases where pangenome contamination may bias the results of gene-resolved analyses.

RevDate: 2024-10-17

Rubin J, van Waaij J, Kraft L, et al (2024)

SAFARI: Pangenome Alignment of Ancient DNA Using Purine/Pyrimidine Encodings.

bioRxiv : the preprint server for biology pii:2024.08.12.607489.

Aligning DNA sequences retrieved from fossils or other paleontological artifacts, referred to as ancient DNA, is particularly challenging due to the short sequence length and chemical damage which creates a specific pattern of substitution (C→T and G→A) in addition to the heightened divergence between the sample and the reference genome thus exacerbating reference bias. This bias can be mitigated by aligning to pangenome graphs to incorporate documented organismic variation, but this approach still suffers from substitution patterns due to chemical damage. We introduce a novel methodology introducing the RYmer index, a variant of the commonly-used minimizer index which represents purines (A,G) and pyrimidines (C,T) as R and Y respectively. This creates an indexing scheme robust to the aforementioned chemical damage. We implemented SAFARI , an ancient DNA damage-aware version of the pangenome aligner vg giraffe which uses RYmers to rescue alignments containing deaminated seeds. We show that our approach produces more correct alignments from ancient DNA sequences than current approaches while maintaining a tolerable rate of spurious alignments. In addition, we demonstrate that our algorithm improves the estimate of the rate of ancient DNA damage, especially for highly damaged samples. Crucially, we show that this improved alignment can directly translate into better insights gained from the data by showcasing its integration with a number of extant pangenome tools.

RevDate: 2024-10-17
CmpDate: 2024-10-17

Ghatak S, Milton AAP, Das S, et al (2024)

Campylobacter coli of porcine origin exhibits an open pan-genome within a single clonal complex: insights from comparative genomic analysis.

Frontiers in cellular and infection microbiology, 14:1449856.

INTRODUCTION: Although Campylobacter spp., including Campylobacter coli, have emerged as important zoonotic foodborne pathogens globally, the understanding of the genomic epidemiology of C. coli of porcine origin is limited.

METHODS: As pigs are an important reservoir of C. coli, we analyzed C. coli genomes that were isolated (n = 3) from pigs and sequenced (this study) them along with all other C. coli genomes for which pig intestines, pig feces, and pigs were mentioned as sources in the NCBI database up to January 6, 2023. In this paper, we report the pan-genomic features, the multi-locus sequence types, the resistome, virulome, and mobilome, and the phylogenomic analysis of these organisms that were obtained from pigs.

RESULTS AND DISCUSSION: Our analysis revealed that, in addition to having an open pan-genome, majority (63%) of the typeable isolates of C. coli of pig origin belonged to a single clonal complex, ST-828. The resistome of these C. coli isolates was predominated by the genes tetO (53%), blaOXA-193 (49%), and APH (3')-IIIa (21%); however, the virulome analysis revealed a core set of 37 virulence genes. Analysis of the mobile genetic elements in the genomes revealed wide diversity of the plasmids and bacteriophages, while 30 transposons were common to all genomes of C. coli of porcine origin. Phylogenomic analysis showed two discernible clusters comprising isolates originating from Japan and another set of isolates comprising mostly copies of a type strain stored in three different culture collections.

RevDate: 2024-10-16
CmpDate: 2024-10-17

Tong X, Luo D, Leung MHY, et al (2024)

Diverse and specialized metabolic capabilities of microbes in oligotrophic built environments.

Microbiome, 12(1):198.

BACKGROUND: Built environments (BEs) are typically considered to be oligotrophic and harsh environments for microbial communities under normal, non-damp conditions. However, the metabolic functions of microbial inhabitants in BEs remain poorly understood. This study aimed to shed light on the functional capabilities of microbes in BEs by analyzing 860 representative metagenome-assembled genomes (rMAGs) reconstructed from 738 samples collected from BEs across the city of Hong Kong and from the skin surfaces of human occupants. The study specifically focused on the metabolic functions of rMAGs that are either phylogenetically novel or prevalent in BEs.

RESULTS: The diversity and composition of BE microbiomes were primarily shaped by the sample type, with Micrococcus luteus and Cutibacterium acnes being prevalent. The metabolic functions of rMAGs varied significantly based on taxonomy, even at the strain level. A novel strain affiliated with the Candidatus class Xenobia in the Candidatus phylum Eremiobacterota and two novel strains affiliated with the superphylum Patescibacteria exhibited unique functions compared with their close relatives, potentially aiding their survival in BEs and on human skins. The novel strains in the class Xenobia possessed genes for transporting nitrate and nitrite as nitrogen sources and nitrosative stress mitigation induced by nitric oxide during denitrification. The two novel Patescibacteria strains both possessed a broad array of genes for amino acid and trace element transport, while one of them carried genes for carotenoid and ubiquinone biosynthesis. The globally prevalent M. luteus in BEs displayed a large and open pangenome, with high infraspecific genomic diversity contributed by 11 conspecific strains recovered from BEs in a single geographic region. The versatile metabolic functions encoded in the large accessory genomes of M. luteus may contribute to its global ubiquity and specialization in BEs.

CONCLUSIONS: This study illustrates that the microbial inhabitants of BEs possess metabolic potentials that enable them to tolerate and counter different biotic and abiotic conditions. Additionally, these microbes can efficiently utilize various limited residual resources from occupant activities, potentially enhancing their survival and persistence within BEs. A better understanding of the metabolic functions of BE microbes will ultimately facilitate the development of strategies to create a healthy indoor microbiome. Video Abstract.

RevDate: 2024-10-16
CmpDate: 2024-10-17

Zdąbłasz K, Lisiecka A, N Dojer (2024)

Sequence Flow: interactive web application for visualizing partial order alignments.

BMC genomics, 25(1):973.

BACKGROUND: Multiple sequence alignment (MSA) has proven extremely useful in computational biology, especially in inferring evolutionary relationships via phylogenetic analysis and providing insight into protein structure and function. An alternative to the standard MSA model is partial order alignment (POA), in which aligned sequences are represented as paths in a graph rather than rows in a matrix. While the POA model has proven useful in several applications (e.g. sequencing reads assembly and pangenome structure exploration), we lack efficient visualization tools that could highlight its advantages.

RESULTS: We propose Sequence Flow - a web application designed to address the above problem. Sequence Flow presents the POA as a Sankey diagram, a kind of graph visualisation typically used for graphs representing flowcharts. Sequence Flow enables interactive alignment exploration, including fragment selection, highlighting a selected group of sequences, modification of the position of graph nodes, structure simplification etc. After adjustment, the visualization can be saved as a high-quality graphic file. Thanks to the use of SanKEY.js - a JavaScript library for creating Sankey diagrams, designed specifically to visualize POAs, Sequence Flow provides satisfactory performance even with large alignments.

CONCLUSIONS: We provide Sankey diagram-based POA visualization tools for both end users (Sequence Flow) and bioinformatic software developers (SanKEY.js). Sequence Flow webservice is available at https://sequenceflow.mimuw.edu.pl/ . The source code for SanKEY.js is available at https://github.com/Krzysiekzd/SanKEY.js and for Sequence Flow at https://github.com/Krzysiekzd/SequenceFlow .

RevDate: 2024-10-16
CmpDate: 2024-10-16

Groza C, Chen X, Wheeler TJ, et al (2024)

A unified framework to analyze transposable element insertion polymorphisms using graph genomes.

Nature communications, 15(1):8915.

Transposable elements are ubiquitous mobile DNA sequences generating insertion polymorphisms, contributing to genomic diversity. We present GraffiTE, a flexible pipeline to analyze polymorphic mobile elements insertions. By integrating state-of-the-art structural variant detection algorithms and graph genomes, GraffiTE identifies polymorphic mobile elements from genomic assemblies or long-read sequencing data, and genotypes these variants using short or long read sets. Benchmarking on simulated and real datasets reports high precision and recall rates. GraffiTE is designed to allow non-expert users to perform comprehensive analyses, including in models with limited transposable element knowledge and is compatible with various sequencing technologies. Here, we demonstrate the versatility of GraffiTE by analyzing human, Drosophila melanogaster, maize, and Cannabis sativa pangenome data. These analyses reveal the landscapes of polymorphic mobile elements and their frequency variations across individuals, strains, and cultivars.

RevDate: 2024-10-16
CmpDate: 2024-10-16

Urrutia C, Leyton-Carcaman B, M Abanto Marin (2024)

Contribution of the Mobilome to the Configuration of the Resistome of Corynebacterium striatum.

International journal of molecular sciences, 25(19): pii:ijms251910499.

Corynebacterium striatum, present in the microbiota of human skin and nasal mucosa, has recently emerged as a causative agent of hospital-acquired infections, notable for its resistance to multiple antimicrobials. Its mobilome comprises several mobile genetic elements, such as plasmids, transposons, insertion sequences and integrons, which contribute to the acquisition of antimicrobial resistance genes. This study analyzes the contribution of the C. striatum mobilome in the transfer and dissemination of resistance genes. In addition, integrative and conjugative elements (ICEs), essential in the dissemination of resistance genes between bacterial populations, whose role in C. striatum has not yet been studied, are examined. This study examined 365 C. striatum genomes obtained from the NCBI Pathogen Detection database. Phylogenetic and pangenome analyses were performed, the resistance profile of the bacterium was recognized, and mobile elements, including putative ICE, were detected. Bioinformatic analyses identified 20 antimicrobial resistance genes in this species, with the Ermx gene being the most predominant. Resistance genes were mainly associated with plasmid sequence regions and class 1 integrons. Although an ICE was detected, no resistance genes linked to this element were found. This study provided valuable information on the geographic spread and prevalence of outbreaks observed through phylogenetic and pangenome analyses, along with identifying antimicrobial resistance genes and mobile genetic elements that carry many of the resistance genes and may be the subject of future research and therapeutic approaches.

RevDate: 2024-10-14

Ndiaye M, Prieto-Baños S, Fitzgerald LM, et al (2024)

When less is more: sketching with minimizers in genomics.

Genome biology, 25(1):270.

The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze genomic data, including de novo genome assembly, metagenomics, read alignment, read correction, and pangenomes. We also touch on alternative data sketching techniques including universal hitting sets, syncmers, or strobemers. Minimizers and their alternatives have rapidly become indispensable tools for handling vast amounts of data.

RevDate: 2024-10-14

Ma A, Sun J, Feng L, et al (2024)

Functional diversity of oxidosqualene cyclases in genus Oryza.

The New phytologist [Epub ahead of print].

Triterpene skeletons, catalyzing by 2,3-oxidosqualene cyclases (OSCs), are essential for synthesis of steroids and triterpenoids. In japonica rice cultivars Zhonghua11, a total of 12 OsOSCs have been found. While the catalytic functions of OsOSC1, 3, 4, 9, and 10 remain unclear, the functions of the other OsOSCs have been well studied. In this study, we conducted a comprehensive analysis of 12 OSC genes within genus Oryza with the aid of 63 genomes from cultivated and wild rice. We found that OSC genes are relatively conserved within genus Oryza with a few exceptions. Collinearity analysis further suggested that, throughout the evolutionary history of genus Oryza, the OSC genes have not undergone significant rearrangements or losses. Further functional analysis of 5 uncharacterized OSCs revealed that OsOSC10 was a friedelin synthase, which affected the development of rice grains. Additionally, the reconstructed ancestral sequences of Oryza OSC3 and Oryza OSC9 had lupeol synthase and poaceatapetol synthase activity, respectively. The discovery of friedelin synthase in rice unlocks a new catalytic path and biological function of OsOSC10. The pan-genome analysis of OSCs within genus Oryza gives insights into the evolutionary trajectory and products diversity of Oryza OSCs.

RevDate: 2024-10-14

Heumos S, Heuer ML, Hanssen F, et al (2024)

Cluster-efficient pangenome graph construction with nf-core/pangenome.

Bioinformatics (Oxford, England) pii:7821182 [Epub ahead of print].

MOTIVATION: Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. However, current construction methods often introduce biases, excluding complex sequences or relying on references. The PanGenome Graph Builder (PGGB) addresses these issues. To date, though, there is no state-of-the-art pipeline allowing for easy deployment, efficient and dynamic use of available resources, and scalable usage at the same time.

RESULTS: To overcome these limitations, we present nf-core/pangenome, a reference-unbiased approach implemented in Nextflow following nf-core's best practices. Leveraging biocontainers ensures portability and seamless deployment in HPC environments. Unlike PGGB, nf-core/pangenome distributes alignments across cluster nodes, enabling scalability. Demonstrating its efficiency, we constructed pangenome graphs for 1000 human chromosome 19 haplotypes and 2146 E. coli sequences, achieving a two to threefold speedup compared to PGGB without increasing greenhouse gas emissions.

AVAILABILITY: Nf-core/pangenome is released under the MIT open-source license, available on GitHub and Zenodo, with documentation accessible at https://nf-co.re/pangenome/1.1.2/docs/usage.

SUPPLEMENTARY: Supplementary data are available at Bioinformatics online.

RevDate: 2024-10-14

Roberts MD, Davis O, Josephs EB, et al (2024)

k-mer-based approaches to bridging pangenomics and population genetics.

ArXiv pii:2409.11683.

Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes is challenging, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that $k$-mers are a crucial stepping stone to bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of $k$-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different $k$-mer-based measures of genetic variation behave in population genetic simulations according to the choice of $k$, depth of sequencing coverage, and degree of data compression. Overall, we find that $k$-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity ($\pi$) up to values of about $\pi = 0.025$ ($R^2 = 0.97$) for neutrally evolving populations. For populations with even more variation, using shorter $k$-mers will maintain the scalability up to at least $\pi = 0.1$. Furthermore, in our simulated populations, $k$-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of $k$-mer based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using $k$-mers.

RevDate: 2024-10-14

Wiersma AT, Hamilton JP, Vaillancourt B, et al (2024)

k-mer genome-wide association study for anthracnose and BCMV resistance in a Phaseolus vulgaris Andean Diversity Panel.

The plant genome [Epub ahead of print].

Access to broad genomic resources and closely linked marker-trait associations for common beans (Phaseolus vulgaris L.) can facilitate development of improved varieties with increased yield, improved market quality traits, and enhanced disease resistance. The emergence of virulent races of anthracnose (caused by Colletotrichum lindemuthianum) and bean common mosaic virus (BCMV) highlight the need for improved methods to identify and incorporate pan-genomic variation in breeding for disease resistance. We sequenced the P. vulgaris Andean Diversity Panel (ADP) and performed a genome-wide association study (GWAS) to identify associations for resistance to BCMV and eight races of anthracnose. Historical single nucleotide polymorphism (SNP)-chip and phenotypic data enabled a three-way comparison between SNP-chip, reference-based whole genome shotgun sequence (WGS)-SNP, and reference-free k-mer (short nucleotide subsequence) GWAS. Across all traits, there was excellent concordance between SNP-chip, WGS-SNP, and k-mer GWAS results-albeit at a much higher marker resolution for the WGS data sets. Significant k-mer haplotype variation revealed selection of the linked I-gene and Co-u traits in North American breeding lines and cultivars. Due to structural variation, only 9.1 to 47.3% of the significantly associated k-mers could be mapped to the reference genome. Thus, to determine the genetic context of cis-associated k-mers, we generated draft whole genome assemblies of four ADP accessions and identified an expanded local repertoire of disease resistance genes associated with resistance to anthracnose and BCMV. With access to variant data in the context of a pan-genome, high resolution mapping of agronomic traits for common bean is now feasible.

RevDate: 2024-10-10
CmpDate: 2024-10-10

Liang J, Liu B, Christensen MJ, et al (2024)

The effects of Pseudomonas strains isolated from Achnatherum inebrians on plant growth: A genomic perspective.

Environmental microbiology reports, 16(5):e70011.

Achnatherum inebrians is a perennial grass widely distributed in northwest China. Nearly all wild A. inebrians plants are infected by Epichloë endophytes. In this study, bacteria from the phyllosphere were isolated from leaves of both endophyte-free and endophyte-infected A. inebrians and sequenced for identification. Pseudomonas, comprising 48.12% of the culturable bacterial communities, was the most dominant bacterial genus. Thirty-four strains from 12 Pseudomonas species were used to inoculate A. inebrians seeds and plants. Results indicated that Epichloë significantly increased the diversity and richness index of the phyllosphere. Pseudomonas Sp1, Sp3, Sp5 and Sp7 had a significantly positive effect on plant growth and photosynthesis, whereas Sp10, Sp11 and Sp12 had a significantly negative effect. Whole-genome and pan-genome analysis suggested that the variability in the effects of Pseudomonas on A. inebrians was related to differences in genome composition and genomic islands.

RevDate: 2024-10-10

Zhang M, Yin Z, Chen B, et al (2024)

Investigation of Citrobacter freundii clinical isolates in a Chinese hospital during 2020-2022 revealed genomic characterization of an extremely drug-resistant C. freundii ST257 clinical strain GMU8049 co-carrying blaNDM-1 and a novel blaCMY variant.

Microbiology spectrum [Epub ahead of print].

The emergence of multidrug-resistant Citrobacter freundii poses a significant threat to public health. C. freundii isolates were collected from clinical patients in a Chinese hospital during 2020-2022. An unusual strain, GMU8049, was not susceptible to any of the antibiotics tested, including the novel β-lactam/β-lactamase inhibitor combination ceftazidime-avibactam. Whole-genome sequencing (WGS) revealed that GMU8049 harbors a circular chromosome belonging to the rare ST257 and an IncX3 resistance plasmid. Genomic analysis revealed the coexistence of two β-lactamase genes, including plasmid-mediated blaNDM-1 and chromosomal blaCMY encoding a novel CMY variant, combined with an outer membrane porin deficiency, which may account for the extreme resistance to β-lactams. Conjugation experiment confirmed that the blaNDM-1 resistance gene located on pGMU8049 could be successfully transferred to Escherichia coli EC600. The novel CMY variant had an amino acid substitution at position 106 (N106S) compared to the closely related CMY-51. Additionally, a GMU8049-specific truncation in an OmpK37 variant that produces a premature stop codon. Moreover, a variety of chromosome-located efflux pump coding genes and virulence-related genes were also identified. Analysis of strain GMU8049 in the context of other C. freundii strains reveals an open pan-genome and the presence of mobile genetic elements that can mediate horizontal gene transfer of antimicrobial resistance and virulence genes. Our work provides comprehensive insights into the genetic mechanisms of highly resistant C. freundii, highlighting the importance of genomic surveillance of this opportunistic pathogen as a high-risk population for emerging resistance and pathogenicity.IMPORTANCEEmerging pathogens exhibiting multi-, extremely, and pan-drug resistance are a major concern for hospitalized patients and the healthcare community due to limited antimicrobial treatment options and the potential for spread. Genomic technologies have enabled clinical surveillance of emerging pathogens and modeling of the evolution and transmission of antimicrobial resistance and virulence. Here, we report the genomic characterization of an extremely drug-resistant ST257 Citrobacter freundii clinical isolate. Genomic analysis of GMU8049 with a rare ST type and unusual phenotypes can provide information on how this extremely resistant clinical isolate has evolved, including the acquisition of blaNDM-1 via the IncX3 plasmid and accumulation through chromosomal mutations leading to a novel CMY variant and deficiency of the outer membrane porin OmpK37. Our work highlights that the emergence of extremely resistant C. freundii poses a significant challenge to the treatment of clinical infections. Therefore, great efforts must be made to specifically monitor this opportunistic pathogen.

RevDate: 2024-10-11

Naser-Khdour S, Scheuber F, Fields PD, et al (2024)

The Evolution of Extreme Genetic Variability in a Parasite-Resistance Complex.

Genome biology and evolution pii:7818197 [Epub ahead of print].

Genomic regions that play a role in parasite defense are often found to be highly variable, with the MHC serving as an iconic example. Single nucleotide polymorphisms may represent only a small portion of this variability, with Indel polymorphisms and copy number variation further contributing. In extreme cases, haplotypes may no longer be recognized as orthologous. Understanding the evolution of such highly divergent regions is challenging because the most extreme variation is not visible using reference-assisted genomic approaches. Here we analyze the case of the Pasteuria Resistance Complex (PRC) in the crustacean Daphnia magna, a defense complex in the host against the common and virulent bacterium Pasteuria ramosa. Two haplotypes of this region have been previously described, with parts of it being non-homologous, and the region has been shown to be under balancing selection. Using pan-genome analysis and tree reconciliation methods to explore the evolution of the PRC and its characteristics within and between species of Daphnia and other Cladoceran species, our analysis revealed a remarkable diversity in this region even among host species, with many non-homologous hyper-divergent-haplotypes. The PRC is characterized by extensive duplication and losses of Fucosyltransferase (FuT) and Galactosyltransferase (GalT) genes that are believed to play a role in parasite defense. The PRC region can be traced back to common ancestors over 250 million years. The unique combination of an ancient resistance complex and a dynamic, hyper-divergent genomic environment presents a fascinating opportunity to investigate the role of such regions in the evolution and long-term maintenance of resistance polymorphisms. Our findings offer valuable insights into the evolutionary forces shaping disease resistance and adaptation, not only in the genus Daphnia, but potentially across the entire Cladocera class.

RevDate: 2024-10-11

Ferro E, Oliva M, Gagie T, et al (2024)

Building a pangenome alignment index via recursive prefix-free parsing.

iScience, 27(10):110933.

Pangenomics alignment offers a solution to reduce bias in biomedical research. Traditionally, short-read aligners like Bowtie and BWA indexed a single reference genome to find approximate alignments. These methods, limited by linear-memory requirements, can only index a few genomes. Emerging pangenome aligners, such as VG, Giraffe, and Moni, address this by indexing more genomes. VG and Giraffe use a variation graph, while Moni indexes sequences accounting for repetition using prefix-free parsing to build a dictionary and parse. The main challenge is the parse's size, which becomes significantly larger than the dictionary. To scale Moni, we propose removing the parse from the construction of the run-length encoded BWT (RLBWT), suffix array, and Longest Common Prefix (LCP) by applying prefix-free parsing recursively. This approach improves construction time and memory requirements, enabling efficient construction of RLBWT, suffix array, and LCP for large pangenomes, such as those from the Human Pangenome Reference Consortium.

RevDate: 2024-10-11

Gabory E, Mwaniki MN, Pisanti N, et al (2024)

Pangenome comparison via ED strings.

Frontiers in bioinformatics, 4:1397036.

INTRODUCTION: An elastic-degenerate (ED) string is a sequence of sets of strings. It can also be seen as a directed acyclic graph whose edges are labeled by strings. The notion of ED strings was introduced as a simple alternative to variation and sequence graphs for representing a pangenome, that is, a collection of genomic sequences to be analyzed jointly or to be used as a reference.

METHODS: In this study, we define notions of matching statistics of two ED strings as similarity measures between pangenomes and, consequently infer a corresponding distance measure. We then show that both measures can be computed efficiently, in both theory and practice, by employing the intersection graph of two ED strings.

RESULTS: We also implemented our methods as a software tool for pangenome comparison and evaluated their efficiency and effectiveness using both synthetic and real datasets.

DISCUSSION: As for efficiency, we compare the runtime of the intersection graph method against the classic product automaton construction showing that the intersection graph is faster by up to one order of magnitude. For showing effectiveness, we used real SARS-CoV-2 datasets and our matching statistics similarity measure to reproduce a well-established clade classification of SARS-CoV-2, thus demonstrating that the classification obtained by our method is in accordance with the existing one.

RevDate: 2024-10-11

Udaondo Z, Ramos JL, K Abram (2024)

Unraveling the Genomic Diversity of the Pseudomonas putida Group: Exploring Taxonomy, Core Pangenome, and Antibiotic Resistance Mechanisms.

FEMS microbiology reviews pii:7818139 [Epub ahead of print].

The genus Pseudomonas is characterized by its rich genetic diversity, with over 300 species been validly recognized. This reflects significant progress made through sequencing and computational methods. Pseudomonas putida group comprises highly adaptable species that thrive in diverse environments and play various ecological roles, from promoting plant growth to being pathogenic in immunocompromised individuals. By leveraging the GRUMPS computational pipeline, we scrutinized 26363 genomes labeled as Pseudomonas in NCBI GenBank, categorizing all Pseudomonas spp. genomes into 435 distinct species-level clusters or cliques. We identified 224 strains deposited under the taxonomic identifier "Pseudomonas putida" distributed within 31 of these species-level clusters, challenging prior classifications. Nine of these 31 cliques contained at least six genomes labeled as "Pseudomonas putida" and were analyzed in depth, particularly clique_1 (P. alloputida) and clique_2 (P. putida). Pangenomic analysis of a set of 413 P. putida group strains revealed over 2.2 million proteins and more than 77000 distinct protein families. The core genome of these 413 strains includes 2226 protein families involved in essential biological processes. Intraspecific genetic homogeneity was observed within each clique, each possessing a distinct genomic identity. These cliques exhibit distinct core genes and diverse subgroups, reflecting adaptation to specific environments. Contrary to traditional views, nosocomial infections by P. alloputida, P. putida, and P. monteilii have been reported, with strains showing varied antibiotic resistance profiles due to diverse mechanisms. This review enhances the taxonomic understanding of key P. putida group species using advanced population genomics approaches and provides a comprehensive understanding of their genetic diversity, ecological roles, interactions, and potential applications.

RevDate: 2024-10-10

Vaduva P, J Bertherat (2024)

The molecular genetics of adrenal cushing.

Hormones (Athens, Greece) [Epub ahead of print].

Adrenal Cushing represents 20% of cases of endogenous hypercorticism. Unilateral cortisol-producing adenoma (CPA), a benign tumor, and adrenocortical carcinoma (ACC), a malignant tumor, are more frequent than bilateral adrenal nodular diseases (primary bilateral macronodular adrenal hyperplasia (PBMAH) and primary pigmented nodular adrenal disease (PPNAD)).In cortisol-producing adrenal tumors, the signaling pathways mainly altered are the protein kinase A and Wnt/β-catenin pathways. Studying components of these pathways and exploring syndromic and familial cases of these tumors has historically enabled identification of many of the predisposing genes. More recently, pangenomic sequencing revealed alterations in sporadic tumors.In ACC, mainly due to TP53 alterations causing Li-Fraumeni syndrome, germline predisposition is frequent in children, while it is rare in adults. Pathogenic variants in the DNA mismatch repair genes MLH1, MSH2, MSH6, and PMS2, which cause Lynch syndrome or alterations of IGF2 and CDKN1C (11p15 locus) in Beckwith-Wiedemann syndrome, can also cause ACC. Rarely, ACC is described in other hereditary tumor syndromes due to germline pathogenic variants in MEN1 or APC and, in very rare cases, NF1, SDH, PRKAR1A, or BRCA2. Concerning ACC somatic alterations, TP53 and genetic or epigenetic alterations at the 11p15 locus are also frequently described, as well as CTNNB1 and ZNRF3 pathogenic variants.CPAs mainly harbor somatic pathogenic variants in PRKACA and CTNNB1 and, less frequently, PRKAR1A, PRKACB, or GNAS1 pathogenic variants. Isolated PBMAH is due to ARMC5 inactivating pathogenic variants in 20 to 25% of cases and to KDM1A pathogenic variants in food-dependent Cushing. Syndromic PBMAH may be due to germline pathogenic variants in MEN1, APC, or FH, causing type 1 multiple endocrine neoplasia, familial adenomatous polyposis, or hereditary leiomyomatosis-kidney cancer syndrome, respectively. PRKAR1A germline pathogenic variants are the main alteration causing PPNAD (isolated or part of Carney complex).

RevDate: 2024-10-10

Li W (2024)

Personalizing pangenome graphs with k-mers.

Nature genetics pii:10.1038/s41588-024-01954-w [Epub ahead of print].

RevDate: 2024-10-10

Huang P, Charton F, Schmelzle JM, et al (2024)

Pangenome-Informed Language Models for Privacy-Preserving Synthetic Genome Sequence Generation.

bioRxiv : the preprint server for biology pii:2024.09.18.612131.

The public availability of genome datasets, such as The Human Genome Project (HGP), The 1000 Genomes Project, The Cancer Genome Atlas, and the International HapMap Project, has significantly advanced scientific research and medical understanding. Here our goal is to share such genomic information for downstream analysis while protecting the privacy of individuals through Differential Privacy (DP). We introduce synthetic DNA data generation based on pangenomes in combination with Pretrained-Language Models (PTLMs). We introduce two novel tokenization schemes based on pangenome graphs to enhance the modeling of DNA. We evaluated these tokenization methods, and compared them with classical single nucleotide and k -mer tokenizations. We find k -mer tokenization schemes, indicating that our tokenization schemes boost the model's performance consistency with long effective context length (covering longer sequences with the same number of tokens). Additionally, we propose a method to utilize the pangenome graph and make it comply with DP privacy standards. We assess the performance of DP training on the quality of generated sequences with discussion of the trade-offs between privacy and model accuracy. The source code for our work will be published under a free and open source license soon.

RevDate: 2024-10-09
CmpDate: 2024-10-09

Ali R, Ali K, Aurongzeb M, et al (2024)

Characterization of meningitis-causing bacteria, with focus on genomic and pangenomic study of multi-drug resistant Streptococcus pneumoniae from cerebrospinal fluid.

Antonie van Leeuwenhoek, 118(1):16.

Streptococcus pneumoniae is a major cause of meningitis in under developed countries with low vaccination rates and high antibiotic resistance. This study aimed to analyze 83 suspected meningitis patients in Karachi for the detection of S. pneumoniae followed by its whole genome sequencing and Pan Genome analysis. Out of the 83 samples collected, 33 samples with altered physical (turbidity), cytological (white blood cell count) and biochemical (total protein and total glucose concentrations) parameters indicated potential meningitis cases, while these parameters were within normal healthy ranges in remaining 50 samples. Latex particle agglutination (LPA) was performed on the 33 samples, revealing 20 positive cases of bacterial meningitis. The PCR and culturing methods revealed 5 S. pneumoniae isolates. Antibiotic susceptibility tests showed that one S. pneumoniae strain was resistant to erythromycin, levofloxacin, and tetracycline. Whole-genome sequencing of this resistant strain was performed and S. pneumoniae was confirmed with MLST analysis, while it had > 2.3 Mb genome and a single repUS43 plasmid. In CARD analysis, the strain had tet(M), ermB, RlmA(II), patB, pmrA, and patA ARGs, which could provide resistance against tetracycline, macrolide, fluoroquinolone, and glycopeptide antibiotics. Phylogenetic analysis revealed that the isolate was closely related to strains from Hungary and the USA. Pan-genome analysis with 144 genome assemblies from NCBI database showed that 1101 non-redundant core genes were shared between all strains. This study gives valuable understanding into the prevalence and characterization of meningitis-causing bacteria in Karachi, Pakistan with prime focus on multi-drug resistant S. pneumoniae.

RevDate: 2024-10-08

Chu N, Liu TT, Zhang HL, et al (2024)

Complete genome sequences of two Pantoea stewartii strains ATCC 8199 from maize and PSCN1 from sugarcane.

BMC genomic data, 25(1):86.

OBJECTIVES: The pathogen of Pantoea stewartii (Ps) is the causal agent of bacterial disease in corn and various graminaceous plants. Ps has two subspecies, Pantoea stewartii subsp. stewartia (Pss) and Pantoea stewartii subsp. indologenes (Psi). This study presents two complete genomes of Ps strains including ATCC 8199 isolated from maize and PSCN1 causing bacterial wilt in sugarcane. The two bacterial genomes information will be helpful for taxonomy analysis in this genus Pantoea at whole-genome levels and accurately discriminated the two subspecies of Pss and Psi.

DATA DESCRIPTION: The reference strain ATCC 8199 isolated from maize was purchased from Beijing Biobw Biotechnology Co., Ltd. (China) and the strain of PSCN1 was isolated from sugarcane cultivar YZ08-1095 in Zhanjiang, Guangdong province of China. Two complete genomes were sequenced using Illumina Hiseq (second-generation) and Oxford Nanopore (third-generation) platforms. The genome of the strain ATCC 8199 comprised of 4.78 Mb with an average GC content of 54.03%, along with five plasmids, encoding a total of 4,846 gene with an average gene length of 827 bp. The genome of PSCN1 comprised of 5.03 Mb with an average GC content of 53.78%, along with two plasmids, encoding a total of 4,725 gene with an average gene length of 913 bp. The bacterial pan-genome analysis highlighted the strain ATCC 8199 was clustered into a subgroup with a Pss strain CCUG 26,359 from USA, while the strain PSCN1 was clustered into another subgroup with a Ps strain NRRLB-133 from USA. These findings will serve as a useful resource for further analyses of the evolution of Ps strains and corresponding disease epidemiology worldwide.

RevDate: 2024-10-08

Cortinovis G, Vincenzi L, Anderson R, et al (2024)

Author Correction: Adaptive gene loss in the common bean pan-genome during range expansion and domestication.

Nature communications, 15(1):8715 pii:10.1038/s41467-024-52864-8.

RevDate: 2024-10-08

Liu D, Luo C, Dai R, et al (2024)

AMIR: a multi-omics data platform for Asteraceae plants genetics and breeding research.

Nucleic acids research pii:7815640 [Epub ahead of print].

As the largest family of dicotyledon, the Asteraceae family comprises a variety of economically important crops, ornamental plants and numerous medicinal herbs. Advancements in genomics and transcriptomic have revolutionized research in Asteraceae species, generating extensive omics data that necessitate an efficient platform for data integration and analysis. However, existing databases face challenges in mining genes with specific functions and supporting cross-species studies. To address these gaps, we introduce the Asteraceae Multi-omics Information Resource (AMIR; https://yanglab.hzau.edu.cn/AMIR/), a multi-omics hub for the Asteraceae plant community. AMIR integrates diverse omics data from 74 species, encompassing 132 genomes, 4 408 432 genes annotated across seven different perspectives, 3897 transcriptome sequencing samples spanning 131 organs, tissues and stimuli, 42 765 290 unique variants and 15 662 metabolites genes. Leveraging these data, AMIR establishes the first pan-genome, comparative genomics and transcriptome system for the Asteraceae family. Furthermore, AMIR offers user-friendly tools designed to facilitate extensive customized bioinformatics analyses. Two case studies demonstrate AMIR's capability to provide rapid, reproducible and reliable analysis results. In summary, by integrating multi-omics data of Asteraceae species and developing powerful analytical tools, AMIR significantly advances functional genomics research and contributes to breeding practices of Asteraceae.

RevDate: 2024-10-08

Zhang X, Zhou Y, Fu L, et al (2024)

WGS Analysis of Staphylococcus warneri Outbreak in a Neonatal Intensive Care Unit.

Infection and drug resistance, 17:4279-4289.

PURPOSE: Staphylococcus warneri is an opportunistic pathogen responsible for hospital-acquired infections (HAIs). The aim of this study was to describe an outbreak caused by S. warneri infection in a neonatal intensive care unit (NICU) and provide investigation, prevention and control strategies for this outbreak.

METHODS: We conducted an epidemiological investigation of the NICU S. warneri outbreak, involving seven neonates, staff, and environmental screening, to identify the source of infection. WGS analyses were performed on S. warneri isolates, including species identification, core genome single-nucleotide polymorphism (cgSNP) analysis, pan-genome analysis, and genetic characterization assessment of the prevalence of specific antibiotic resistance and virulence genes.

RESULTS: Eight S. warneri strains were isolated from this outbreak, with seven from neonates and one from environment. Six clinical cases within three days in 2021 were linked to one strain isolated from environmental samples; isolates varied by 0-69 SNPs and were confirmed to be from an outbreak through WGS. Multiple infection prevention measures were implemented, including comprehensive environmental disinfection and stringent protocols, and all affected neonates were transferred to the isolation wards. Following these interventions, no further cases of S. warneri infections were observed. Furthermore, pan-genome analysis results suggested that in human S. warneri may exhibit host specificity.

CONCLUSION: The investigation has revealed that the outbreak was linked to the milk preparation workbench by the WGS. It is recommended that there be a stronger focus on environmental disinfection management in order to raise awareness, improve identification, and prevention of healthcare-associated infections that are associated with the hospital environment.

RevDate: 2024-10-08

Du Y, Qian C, Li X, et al (2024)

Unveiling intraspecific diversity and evolutionary dynamics of the foodborne pathogen Bacillus paranthracis through high-quality pan-genome analysis.

Current research in food science, 9:100867.

Understanding the evolutionary dynamics of foodborne pathogens throughout host-associated habitats is of utmost importance. Bacterial pan-genomes, as dynamic entities, are strongly influenced by ecological lifestyles. As a phenotypically diverse species in the Bacillus cereus group, Bacillus paranthracis is recognized as an emerging foodborne pathogen and a probiotic simultaneously. This poorly understood species is a suitable study model for adaptive pan-genome evolution. In this study, we determined the biogeographic distribution, abundance, genetic diversity, and genotypic profiles of key genetic elements of B. paranthracis. Metagenomic read recruitment analyses demonstrated that B. paranthracis members are globally distributed and abundant in host-associated habitats. A high-quality pan-genome of B. paranthracis was subsequently constructed to analyze the evolutionary dynamics involved in ecological adaptation comprehensively. The open pan-genome indicated a flexible gene repertoire with extensive genetic diversity. Significant divergences in the phylogenetic relationships, functional enrichment, and degree of selective pressure between the different components demonstrated different evolutionary dynamics between the core and accessory genomes driven by ecological forces. Purifying selection and gene loss are the main signatures of evolutionary dynamics in B. paranthracis pan-genome. The plasticity of the accessory genome is characterized by horizontal gene transfer (HGT), massive gene losses, and weak purifying or positive selection, which might contribute to niche-specific adaptation. In contrast, although the core genome dominantly undergoes purifying selection, its association with HGT and positively selected mutations indicates its potential role in ecological diversification. Furthermore, host fitness-related dynamics are characterized by the loss of secondary metabolite biosynthesis gene clusters (BGCs) and CAZyme-encoding genes and the acquisition of antimicrobial resistance (AMR) and virulence genes via HGT. This study offers a case study of pan-genome evolution to investigate the ecological adaptations reflected by biogeographical characteristics, thereby advancing the understanding of intraspecific diversity and evolutionary dynamics of foodborne pathogens.

RevDate: 2024-10-07

Moens C, Bogaerts B, Lorente-Leal V, et al (2024)

Genomic comparison between Mycobacterium bovis and Mycobacterium microti and in silico analysis of peptide-based biomarkers for serodiagnosis.

Frontiers in veterinary science, 11:1446930.

In recent years, there has been an increase in the number of reported cases of Mycobacterium microti infection in various animals, which can interfere with the ante-mortem diagnosis of animal tuberculosis caused by Mycobacterium bovis. In this study, whole genome sequencing (WGS) was used to search for protein-coding genes to distinguish M. microti from M. bovis. In addition, the population structure of the available M. microti genomic WGS datasets is described, including three novel Belgian isolates from infections in alpacas. Candidate genes were identified by examining the presence of the regions of difference and by a pan-genome analysis of the available WGS data. A total of 80 genes showed presence-absence variation between the two species, including genes encoding Proline-Glutamate (PE), Proline-Proline-Glutamate (PPE), and Polymorphic GC-Rich Sequence (PE-PGRS) proteins involved in virulence and host interaction. Filtering based on predicted subcellular localization, sequence homology and predicted antigenicity resulted in 28 proteins out of 80 that were predicted to be potential antigens. As synthetic peptides are less costly and variable than recombinant proteins, an in silico approach was performed to identify linear and discontinuous B-cell epitopes in the selected proteins. From the 28 proteins, 157 B-cell epitope-based peptides were identified that discriminated between M. bovis and M. microti species. Although confirmation by in vitro testing is still required, these candidate synthetic peptides containing B-cell epitopes could potentially be used in serological tests to differentiate cases of M. bovis from M. microti infection, thus reducing misdiagnosis in animal tuberculosis surveillance.

RevDate: 2024-10-07

Ford MKB, Hari A, Zhou Q, et al (2024)

Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool.

bioRxiv : the preprint server for biology pii:2024.08.13.607835.

Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.

RevDate: 2024-10-07

Logsdon GA, Ebert P, Audano PA, et al (2024)

Complex genetic variation in nearly complete human genomes.

bioRxiv : the preprint server for biology pii:2024.09.24.614721.

Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.

RevDate: 2024-10-07

Karthik K, Anbazhagan S, Priyadharshini MLM, et al (2024)

Comparative genomics of zoonotic pathogen Clostridioides difficile of animal origin to understand its diversity.

3 Biotech, 14(11):257.

UNLABELLED: Clostridioides difficile, a zoonotic pathogen causing enteric diseases in different animals and humans. A comprehensive study on the presence of toxin genes and antimicrobial resistance genes based on genome data of C. difficile in animals is scanty. In the present study, a total of 15 C. difficile isolates were recovered from dogs and isolates with toxin genes (D1, CD15 and CD26) along with two other non-toxigenic strains (CD28, CD32) were used for whole genome sequencing and comparative genomics. Sequence type-based clustering was noted in the whole genome phylogeny with 4 known multi-locus sequence typing (MLST) clades namely I, II, IV, and V and a cryptic clade. ST11 and ST54 were reported for the 2[nd] time worldwide in dogs. Out of 109 genomes used in the study, 29 genomes were predicted with all four toxin genes (toxA, toxB, cdtA, cdtB) while 22 did not have any of the toxin genes. ST11 of MLST clade V had the maximum number of 46 genomes predicted with at least one toxin gene. Among the genomes sequenced in this study, CD26 had a maximum of 5 AMR genes (aac(6')-aph(2″), ant(6)-Ia, catP, erm(B)_18, and tet(M)_11) and CD15 was predicted with 2 AMR genes (aac(6')-aph(2″), erm(B)_18). Tetracycline resistance genes were predicted most in the ST11 genome. Of the 22 non-toxigenic strains, 9 genomes (ST48 = 5, ST3 = 2, ST109 = 1, ST15 = 1) were predicted with a minimum of one AMR gene. Pangenome analysis indicated that the Bpan value is 0.12 showing that C. difficile has an open pangenome structure. This indicates that the organism can evolve by the addition of new genes. This study reports the circulation of clinically important ST11 and multidrug-resistant non-toxigenic strains among animals.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-024-04102-7.

RevDate: 2024-10-04

Liu J, Shi Y, Mo D, et al (2024)

The goat pan-genome reveals patterns of gene loss during domestication.

Journal of animal science and biotechnology, 15(1):132.

BACKGROUND: Unveiling genetic diversity features and understanding the genetic mechanisms of diverse goat phenotypes are pivotal in facilitating the preservation and utilization of these genetic resources. However, the total genetic diversity within a species can't be captured by the reference genome of a single individual. The pan-genome is a collection of all the DNA sequences that occur in a species, and it is expected to capture the total genomic diversity of the specific species.

RESULTS: We constructed a goat pan-genome using map-to-pan assemble based on 813 individuals, including 723 domestic goats and 90 samples from their wild relatives, which presented a broad regional and global representation. In total, 146 Mb sequences and 974 genes were identified as absent from the reference genome (ARS1.2; GCF_001704415.2). We identified 3,190 novel single nucleotide polymorphisms (SNPs) using the pan-genome analysis. These novel SNPs could properly reveal the population structure of domestic goats and their wild relatives. Presence/absence variation (PAV) analysis revealed gene loss and intense negative selection during domestication and improvement.

CONCLUSIONS: Our research highlights the importance of the goat pan-genome in capturing the missing genetic variations. It reveals the changes in genomic architecture during goat domestication and improvement, such as gene loss. This improves our understanding of the evolutionary and breeding history of goats.

RevDate: 2024-10-04
CmpDate: 2024-10-04

Mejía-Limones I, Andrade-Molina D, Morey-León G, et al (2024)

Whole-genome sequencing of Klebsiella pneumoniae MDR circulating in a pediatric hospital setting: a comprehensive genome analysis of isolates from Guayaquil, Ecuador.

BMC genomics, 25(1):928.

BACKGROUND: Klebsiella pneumoniae is the major cause of nosocomial infections worldwide and is related to a worsening increase in Multidrug-Resistant Bacteria (MDR) and virulence genes that seriously affect immunosuppressed patients, long-stay intensive care patients, elderly individuals, and children. Whole-Genome Sequencing (WGS) has resulted in a useful strategy for characterizing the genomic components of clinically important bacteria, such as K. pneumoniae, enabling them to monitor genetic changes and understand transmission, highlighting the risk of dissemination of resistance and virulence associated genes in hospitals. In this study, we report on WGS 14 clinical isolates of K. pneumoniae from a pediatric hospital biobank of Guayaquil, Ecuador.

RESULTS: The main findings revealed pronounced genetic heterogeneity among the isolates. Multilocus sequencing type ST45 was the predominant lineage among non-KPC isolates, whereas ST629 was found more frequently among KPC isolates. Phylogenetic analysis suggested local transmission dynamics. Comparative genomic analysis revealed a core set of 3511 conserved genes and an open pangenome in neonatal isolates. The diversity of MLSTs and capsular types, and the high genetic diversity among these isolates indicate high intraspecific variability. In terms of virulence factors, we identified genes associated with adherence, biofilm formation, immune evasion, secretion systems, multidrug efflux pump transporters, and a notably high number of genes related to iron uptake. A large number of these genes were detected in the ST45 isolate, whereas iron uptake yersiniabactin genes were found exclusively in the non-KPC isolates. We observed high resistance to commonly used antibiotics and determined that these isolates exhibited multidrug resistance including β-lactams, aminoglycosides, fluoroquinolones, quinolones, trimetropins, fosfomycin and macrolides; additionally, resistance-associated point mutations and cross-resistance genes were identified in all the isolates. We also report the first K. pneumoniae KPC-3 gene producers in Ecuador.

CONCLUSIONS: Our WGS results for clinical isolates highlight the importance of MDR in neonatal K. pneumoniae infections and their genetic diversity. WGS will be an imperative strategy for the surveillance of K. pneumoniae in Ecuador, and will contribute to identifying effective treatment strategies for K. pneumoniae infections in critical units in patients at stratified risk.

RevDate: 2024-10-04
CmpDate: 2024-10-04

Nagy N, P Hodor (2024)

Chromosomal gene order defines several structural classes of Staphylococcus epidermidis genomes.

PloS one, 19(10):e0311520 pii:PONE-D-23-36569.

The original methodology for describing the pangenome of a prokaryotic species is based on modeling genomes as unordered sets of genes. More recent findings have underlined the importance of considering the ordering of genes along the genetic material as well, when making comparisons among genomes. To further investigate the benefits of gene order when describing genomes of a given species, we applied two distance metrics on a dataset of 84 genomes of Staphylococcus epidermidis. The first metric, GeLev, depends on the order of genes and is a derivative of the Levenshtein distance. The second, the Jaccard distance, depends on gene sets only. The application of these distances reveals information about the global structure of the genomes, and allows clustering of the genomes into classes. The main biological result is that, while genomes within the same class are structurally similar, genomes of different classes have an additional characteristic. Between genomes in different classes we can discover instances where a large segment of the first genome appears in reverse order in the second. This feature suggests that genome rearrangements in S. epidermidis happen on a large scale, while micro-rearrangements of single or a small number of genes are rare. Thus, this paper describes a straight-forward method to classify genomes into structural classes with the same order of genes and makes it possible to visualize reversed segments in pairs of genomes. The method can be readily applied to other species.

RevDate: 2024-10-04

Neal M, Brakewood W, Betenbaugh M, et al (2024)

Pan-genome-scale metabolic modeling of Bacillus subtilis reveals functionally distinct groups.

mSystems [Epub ahead of print].

UNLABELLED: Bacillus subtilis is an important industrial and environmental microorganism known to occupy many niches and produce many compounds of interest. Although it is one of the best-studied organisms, much of this focus including the reconstruction of genome-scale metabolic models has been placed on a few key laboratory strains. Here, we substantially expand these prior models to pan-genome-scale, representing 481 genomes of B. subtilis with 2,315 orthologous gene clusters, 1,874 metabolites, and 2,239 reactions. Furthermore, we incorporate data from carbon utilization experiments for eight strains to refine and validate its metabolic predictions. This comprehensive pan-genome model enables the assessment of strain-to-strain differences related to nutrient utilization, fermentation outputs, robustness, and other metabolic aspects. Using the model and phenotypic predictions, we divide B. subtilis strains into five groups with distinct patterns of behavior that correlate across these features. The pan-genome model offers deep insights into B. subtilis' metabolism as it varies across environments and provides an understanding as to how different strains have adapted to dynamic habitats.

IMPORTANCE: As the volume of genomic data and computational power have increased, so has the number of genome-scale metabolic models. These models encapsulate the totality of metabolic functions for a given organism. Bacillus subtilis strain 168 is one of the first bacteria for which a metabolic network was reconstructed. Since then, several updated reconstructions have been generated for this model microorganism. Here, we expand the metabolic model for a single strain into a pan-genome-scale model, which consists of individual models for 481 B. subtilis strains. By evaluating differences between these strains, we identified five distinct groups of strains, allowing for the rapid classification of any particular strain. Furthermore, this classification into five groups aids the rapid identification of suitable strains for any application.

RevDate: 2024-10-04

Ajesh BR, Sariga R, Nakkeeran S, et al (2024)

Insights on mining the pangenome of Sphingobacterium thalpophilum NMS02 S296 from the resistant banana cultivar Pisang lilin confirms the antifungal action against Fusarium oxysporum f. sp. cubense.

Frontiers in microbiology, 15:1443195.

INTRODUCTION: Fusarium wilt, caused by Fusarium oxysporum f. sp. cubense (Foc), poses a significant global threat to banana cultivation. Conventional methods of disease management are increasingly challenged, thus making it necessary to explore alternative strategies. Bacterial endophytes, particularly from resistant genotypes, are gaining attention as potential biocontrol agents. Sphingobacterium thalpophilum, isolated from the resistant banana cultivar Pisang lilin (JALHSB010000001-JALHSB010000029), presents an intriguing prospect for combating Fusarium wilt. However, its underlying biocontrol mechanisms remain poorly understood. This study aimed to elucidate the antifungal efficacy of S. thalpophilum NMS02 S296 against Foc and explore its biocontrol mechanisms at the genomic level.

METHODS: Whole genome sequencing of S. thalpophilum NMS02 S296 was conducted using next-generation sequencing technologies and bioinformatics analyses were performed to identify genes associated with antifungal properties. In vitro assays were used to assess the inhibitory effects of the bacterial isolate on the mycelial growth of Foc. To explore the biomolecules responsible for the observed antagonistic activity, metabolites diffused into the agar at the zone of inhibition between Foc S16 and S. thalpophilum NMS02 S296 were extracted and identified.

RESULTS: Whole genome sequencing revealed an array of genes encoding antifungal enzymes and secondary metabolites in S. thalpophilum NMS02 S296. In vitro experiments demonstrated significant inhibition of Foc mycelial growth by the bacterial endophyte. Comparative genomic analysis highlighted unique genomic features in S. thalpophilum linked to its biocontrol potential, setting it apart from other bacterial species.

DISCUSSION: The study underscores the remarkable antifungal efficacy of S. thalpophilum NMS02 S296 against Fusarium wilt. The genetic basis for its biocontrol potential was elucidated through whole genome sequencing, shedding light on the mechanisms behind its antifungal activity. This study advanced our understanding of bacterial endophytes as biocontrol agents and offers a promising avenue for plant growth promotion towards sustainable strategies to mitigate Fusarium wilt in banana cultivation.

RevDate: 2024-10-03

Vogel NA, Rubin JD, Pedersen AG, et al (2024)

soibean: High-resolution Taxonomic Identification of Ancient Environmental DNA Using Mitochondrial Pangenome Graphs.

Molecular biology and evolution pii:7809583 [Epub ahead of print].

Ancient environmental DNA (aeDNA) is becoming a powerful tool to gain insights about past ecosystems, overcoming the limitations of conventional fossil records. However, several methodological challenges remain, particularly for classifying the DNA to species level and conducting phylogenetic analysis. Current methods, primarily tailored for modern datasets, fail to capture several idiosyncrasies of aeDNA, including species mixtures from closely related species and ancestral divergence. We introduce soibean, a novel tool that utilises mitochondrial pangenomic graphs for identifying species from aeDNA reads. It outperforms existing methods in accurately identifying species from multiple closely related sources within a sample, enhancing phylogenetic analysis for aeDNA. soibean employs a damage-aware likelihood model for precise identification at low coverage with a high damage rate. Additionally, we reconstructed ancestral sequences for soibean's database to handle aeDNA that is highly diverged from modern references. soibean demonstrates effectiveness through simulated data tests and empirical validation. Notably, our method uncovered new empirical results in published datasets, including using porpoise whales as food in a Mesolithic community in Sweden, demonstrating its potential to reveal previously unrecognised findings in aeDNA studies.

RevDate: 2024-10-01

Shoer S, Reicher L, Zhao C, et al (2024)

Pangenomes of human gut microbiota uncover links between genetic diversity and stress response.

Cell host & microbe pii:S1931-3128(24)00324-X [Epub ahead of print].

The genetic diversity of the gut microbiota has a central role in host health. Here, we created pangenomes for 728 human gut prokaryotic species, quadrupling the genes of strain-specific genomes. Each of these species has a core set of a thousand genes, differing even between closely related species, and an accessory set of genes unique to the different strains. Functional analysis shows high strain variability associates with sporulation, whereas low variability is linked with antibiotic resistance. We further map the antibiotic resistome across the human gut population and find 237 cases of extreme resistance even to last-resort antibiotics, with a predominance among Enterobacteriaceae. Lastly, the presence of specific genes in the microbiota relates to host age and sex. Our study underscores the genetic complexity of the human gut microbiota, emphasizing its significant implications for host health. The pangenomes and antibiotic resistance map constitute a valuable resource for further research.

RevDate: 2024-10-01

Li Q, Yang J, Wang M, et al (2024)

Global distribution and genomic characteristics analysis of avian-derived mcr-1-positive Escherichia coli.

Ecotoxicology and environmental safety, 285:117109 pii:S0147-6513(24)01185-0 [Epub ahead of print].

The prevalence of avian-derived Escherichia coli (E. coli) carrying mcr-1 poses a significant threat to the development of the poultry industry and public health safety. Despite ongoing in-depth epidemiological research worldwide, a comprehensive macroscopic study based on genomics is still lacking. In response, this study collected 1104 genomic sequences of avian-derived mcr-1-positive E. coli (MCRPEC) from the NCBI public database, covering 31 countries. The majority of sequences originated from China (48.82 %), followed by the Netherlands (10.41 %). In terms of avian hosts, chicken accounted for the largest proportion (44.11 %), followed by gallus (24.09 %). Avian-derived MCRPEC also serves as a reservoir for other antibiotic resistance genes (ARGs), with 179 ARGs coexisting with mcr-1 identified. A total of 206 virulence-associated genes were also identified, revealing the pathogenic risks of MCRPEC. Pan-genome analysis revealed that avian-derived MCRPEC from different hosts, countries of origin, and serotypes exhibit minor SNP differences, indicating a high risk of cross-regional and cross-host transmission. The ST types of MCRPRC are diverse, with ST10 being the most prevalent (n=70). Spearman analysis showed a significant correlation between the number of ARGs and the insertion sequences (ISs) as well as plasmid replicon in ST10 strains. Furthermore, ST10 strains share a similar genetic basis with human-derived MCRPEC, suggesting the possibility of clonal dissemination. Pan-genome-wide association studies (pan-GWAS) indicated that the differential genes of MCRPEC from different countries and host sources are significantly different, mainly related to genes encoding type IV secretion systems and mobile genetic elements (MGEs). Plasmid mapping of showed that the prevalent plasmid types vary by country and host, with IncI2 and IncX4 being the main mcr-1-positive plasmids. Among the 12 identified mcr-1 genetic contexts with ISs, the Tn6330 transposon was the predominant carrier of mcr-1. In summary, the potential threat of avian-derived MCRPEC cannot be ignored, and long-term and comprehensive monitoring are essential.

RevDate: 2024-10-01

Ling X, Gu X, Shen Y, et al (2024)

Comparative genomic analysis of Acanthamoeba from different sources and horizontal transfer events of antimicrobial resistance genes.

mSphere [Epub ahead of print].

UNLABELLED: Acanthamoeba species are among the most common free-living amoeba and ubiquitous protozoa, mainly distributed in water and soil, and cause Acanthamoeba keratitis (AK) and severe visual impairment in patients. Although several studies have reported genomic characteristics of Acanthamoeba, limited sample sizes and sources have resulted in an incomplete understanding of the genetic diversity of Acanthamoeba from different sources. While endosymbionts exert a significant influence on the phenotypes of Acanthamoeba, including pathogenicity, virulence, and drug resistance, the species diversity and functional characterization remain largely unexplored. Herein, our study sequenced and analyzed the whole genomes of 19 Acanthamoeba pathogenic strains that cause AK, and by integrating publicly available genomes, we sampled 29 Acanthamoeba strains from ocular, environmental, and other sources. Combined pan-genomic and comparative functional analyses revealed genetic differences and evolutionary relationships among the different sources of Acanthamoeba, as well as classification into multiple functional groups, with ocular isolates in particular showing significant differences that may account for differences in pathogenicity. Phylogenetic and rhizome gene mosaic analyses of ocular Acanthamoeba strains suggested that genomic exchanges between Acanthamoeba and endosymbionts, particularly potential antimicrobial resistance genes trafficking including the adeF, amrA, and amrB genes exchange events, potentially contribute to Acanthamoeba drug resistance. In conclusion, this study elucidated the adaptation of Acanthamoeba to different ecological niches and the influence of gene exchange on the evolution of ocular Acanthamoeba genome, guiding the clinical diagnosis and treatment of AK and laying a theoretical groundwork for developing novel therapeutic approaches.

IMPORTANCE: Acanthamoeba causes a serious blinding keratopathy, Acanthamoeba keratitis, which is currently under-recognized by clinicians. In this study, we analyzed 48 strains of Acanthamoeba using a whole-genome approach, revealing differences in pathogenicity and function between strains of different origins. Horizontal transfer events of antimicrobial resistance genes can help provide guidance as potential biomarkers for the treatment of specific Acanthamoeba keratitis cases.

RevDate: 2024-10-02

Che J, Lai C, Lai G, et al (2024)

Complete genome sequence analysis and Pks genes identification of Brevibacillus brevis FJAT-0809-GLX with a broad inhibitory spectrum against phytopathogens.

World journal of microbiology & biotechnology, 40(11):332.

Brevibacillus brevis FJAT-0809-GLX has a broad spectrum of antimicrobial activity. Understanding the molecular basis of biocontrol ability of B. brevis will allow us to develop effective microbial agents for sustainable agriculture. In this study, we present the complete and annotated genome sequence of FJAT-0809-GLX. The complete genome size of B. brevis FJAT-0809-GLX was 6,137,019 bp, with 5688 predicted coding sequences (CDS). The average GC content of 47.38%, and there were 44 copies of the rRNAs operon (16S, 23S and 5S RNA), and 127 tRNA genes. A total of 11,162 genes were functionally annotated with the COG, GO, and KEGG databases, and 123 genes belonged to CAZymes. Genomic secondary metabolite analysis indicated 13 clusters encoding potential new antimicrobials. FJAT-0809-GLX was designated as B. brevis according to average nucleotide polymorphism (ANI) and phylogenetic analysis. The pangenome consisted of 7141 homologous genes, and 4469 homologous genes shared by B. brevis FJAT-0809-GLX, B. brevis NBRC100599, B. brevis DSM30, and B. brevis NCTC2611. The number of unique homologous genes of B. brevis FJAT-0809-GLX (419 genes) and B. brevis NBRC100599 (480 genes) were much more than those in B. brevis DSM30 (13 genes), and B. brevis NCTC2611 (6 genes). Nine gene clusters encoding for secondary metabolite biosynthesis were compared in the genome of B. brevis FJAT-0809-GLX with those of B. brevis NBRC100599, B. brevis DSM30 and B. brevis NCTC2611, and the gene clusters encoding for lantipeptide and transatpks-otherks only existed in genome of B. brevis FJAT-0809-GLX. The 11 BbPks genes were included in the B. brevis FJAT-0809-GLX genome, which contained the conserved PS-DH domain. The relative expression of BbPksL, BbPksM2, BbPksM3, BbPksN3, BbPksN4 and BbPksN5 reached a maximum at 120 h and then decreased at 144 h. Our results provided detailed genomic and Pks genes information for the FJAT-0809-GLX strain, and lid a foundation for studying its biocontrol mechanisms.

RevDate: 2024-10-02

Tong Z, Huang Y, Zhu QH, et al (2024)

Retrospect and prospect of Nicotiana tabacum genome sequencing.

Frontiers in plant science, 15:1474658.

Investigating plant genomes offers crucial foundational resources for exploring various aspects of plant biology and applications, such as functional genomics and breeding practices. With the development in sequencing and assembly technology, several Nicotiana tabacum genomes have been published. In this paper, we reviewed the progress on N. tabacum genome assembly and quality, from the initial draft genomes to the recent high-quality chromosome-level assemblies. The application of long-read sequencing, optical mapping, and Hi-C technologies has significantly improved the contiguity and completeness of N. tabacum genome assemblies, with the latest assemblies having a contig N50 size over 50 Mb. Despite these advancements, further improvements are still required and possible, particularly on the development of pan-genome and telomere-to-telomere (T2T) genomes. These new genomes will capture the genomic diversity and variations among different N. tabacum cultivars and species, and provide a comprehensive view of the N. tabacum genome structure and gene content, so to deepen our understanding of the N. tabacum genome and facilitate precise breeding and functional genomics.

RevDate: 2024-09-30

Kalbfleisch TS, Smith ML, Ciosek JL, et al (2024)

Three decades of rat genomics: approaching the finish(ed) line.

Physiological genomics [Epub ahead of print].

The rat, Rattus norvegicus, has provided an important model for investigation of a range of characteristics of biomedical importance. Here we survey the origins of this species, its introduction into laboratory research and the emergence of genetic and genomic methods that utilize this model organism. Genomic studies have yielded important progress and provided new insight into several biologically important traits. However, some studies have been impeded by the lack of a complete and accurate reference genome for this species. New sequencing and genome assembly methods applied to the rat have resulted in a new reference genome assembly, GRCr8, which is a near telomere-to-telomere assembly of high base level accuracy that incorporates several elements not captured in prior assemblies. As genome assembly methods continue to advance and production costs become a less significant obstacle, genome assemblies for multiple inbred rat strains are emerging. These assemblies will allow a rat pangenome assembly to be constructed which captures all the genetic variation in strains selected for their utility in research and will overcome reference bias, a limitation associated with reliance on a single reference assembly. By this means, the full utility of this model organism to genomic studies will begin to be revealed.

RevDate: 2024-09-30

Mastoras M, Asri M, Brambrink L, et al (2024)

Highly accurate assembly polishing with DeepPolisher.

bioRxiv : the preprint server for biology pii:2024.09.17.613505.

Accurate genome assemblies are essential for biological research, but even the highest quality assemblies retain errors caused by the technologies used to construct them. Base-level errors are typically fixed with an additional polishing step that uses reads aligned to the draft assembly to identify necessary edits. However, current methods struggle to find a balance between over- and under-polishing. Here, we present an encoder-only transformer model for assembly polishing called DeepPolisher, which predicts corrections to the underlying sequence using Pacbio HiFi read alignments to a diploid assembly. Our pipeline introduces a method, PHARAOH (Phasing Reads in Areas Of Homozygosity), which uses ultra-long ONT data to ensure alignments are accurately phased and to correctly introduce heterozygous edits in falsely homozygous regions. We demonstrate that the DeepPolisher pipeline can reduce assembly errors by half, with a greater than 70% reduction in indel errors. We have applied our DeepPolisher-based pipeline to 180 assemblies from the next Human Pangenome Reference Consortium (HPRC) data release, producing an average predicted Quality Value (QV) improvement of 3.4 (54% error reduction) for the majority of the genome.

RevDate: 2024-09-30

Xu A, Lu L, Zhang W, et al (2024)

Microevolution of Bartonella grahamii driven by geographic and host factors.

mSystems [Epub ahead of print].

UNLABELLED: Bartonella grahamii is one of the most prevalent Bartonella species in wild rodents and has been associated with human cases of neuroretinitis. The structure and distribution of genomic diversity in natural B. grahamii is largely unexplored. Here, we have applied a comprehensive population genomic and phylogenomic analysis to 172 strains of B. grahamii to unravel the genetic differences and influencing factors that shape its populations. The findings reveal a remarkable genomic diversity within the species, primarily in the form of single- nucleotide polymorphisms. The open pangenome of B. grahamii indicates a dynamic genomic evolution influenced by its ecological niche. Whole-genome data allowed us to decompose B. grahamii diversity into six phylogroups, each characterized by a unique "mosaic pattern" of hosts and biogeographic regions. This suggests a complex interplay between host specificity and biogeography. In addition, our study suggests a possible origin of European strains from Asian lineages, and host factors have a more significant impact on the genetic differentiation of B. grahamii than geographical factors. These insights contribute to understanding the evolutionary history of this pathogen and provide a foundation for future epidemiological research and public health strategies.

IMPORTANCE: Bartonella grahamii has been reported worldwide and shown to infect humans. Up to now, an effective transmission route of B. grahamii to humans has not been confirmed. The genetic evolution of B. grahamii and the relationship between B. grahamii and its host need to be further studied. The factors driving the genetic diversity of B. grahamii are still controversial. The results showed that the European isolates shared a common ancestor with the Chinese isolates. Host factors were shown to play an important role in driving the genetic diversity of B. grahamii. When host factors were fixed, geographic barriers drove B. grahamii microevolution. Our study emphasizes the importance of characterizing isolate genomes derived from hosts and geographical locations and provides a new reference for the origin of B. grahamii.

RevDate: 2024-09-29
CmpDate: 2024-09-29

Zhao Z, Zhu Z, Jiao Y, et al (2024)

Pan-genome analysis of GT64 gene family and expression response to Verticillium wilt in cotton.

BMC plant biology, 24(1):893.

BACKGROUND: The GT64 subfamily, belonging to the glycosyltransferase family, plays a critical function in plant adaptation to stress conditions and the modulation of plant growth, development, and organogenesis processes. However, a comprehensive identification and systematic analysis of GT64 in cotton are still lacking.

RESULTS: This study used bioinformatics techniques to conduct a detailed investigation on the GT64 gene family members of eight cotton species for the first time. A total of 39 GT64 genes were detected, which could be classified into five subfamilies according to the phylogenetic tree. Among them, six genes were found in upland cotton. Furthermore, investigated the precise chromosomal positions of these genes and visually represented their gene structure details. Moreover, forecasted cis-regulatory elements in GhGT64s and ascertained the duplication type of the GT64 in the eight cotton species. Evaluation of the Ka/Ks ratio for similar gene pairs among the eight cotton species provided insights into the selective pressures acting on these homologous genes. Additionally, analyzed the expression profiles of the GT64 gene family. Overexpressing GhGT64_4 in tobacco improved its disease resistance. Subsequently, VIGS experiments conducted in cotton demonstrated reduced disease resistance upon silencing of the GhGT64_4, may indicate its involvement in affecting lignin and jasmonic acid biosynthesis pathways, thus impacting cotton resistance. Weighted Gene Co-expression Network Analysis (WGCNA) revealed an early immune response against Verticillium dahliae in G. barbadense compared to G. hirsutum. Quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) analysis indicated that some GT64 genes might play a role under various biotic and abiotic stress conditions.

CONCLUSIONS: These discoveries enhance our knowledge of GT64 family members and lay the groundwork for future investigations into the disease resistance mechanisms of this gene in cotton.

RevDate: 2024-09-28

Naqvi M, Utheim TP, C Charnock (2024)

Whole genome sequencing and characterization of Corynebacterium isolated from the healthy and dry eye ocular surface.

BMC microbiology, 24(1):368.

BACKGROUND: The purpose of this study was to characterize Corynebacterium isolated from the ocular surface of dry eye disease patients and healthy controls. We aimed to investigate the pathogenic potential of these isolates in relation to ocular surface health. To this end, we performed whole genome sequencing in combination with biochemical, enzymatic, and antibiotic susceptibility tests. In addition, we employed deferred growth inhibition assays to examine how Corynebacterium isolates may impact the growth of potentially competing microorganisms including the ocular pathogens Pseudomonas aeruginosa and Staphylococcus aureus, as well as other Corynebacterium present on the eye.

RESULTS: The 23 isolates were found to belong to 8 different species of Corynebacterium with genomes ranging from 2.12 mega base pairs in a novel Corynebacterium sp. to 2.65 mega base pairs in C. bovis. Whole genome sequencing revealed the presence of a range of antimicrobial targets present in all isolates. Pangenome analysis showed the presence of 516 core genes and that the pangenome is open. Phenotypic characterization showed variously urease, lipase, mucinase, protease and DNase activity in some isolates. Attention was particularly drawn to a potentially new or novel Corynebacterium species which had the smallest genome, and which produced a range of hydrolytic enzymes. Strikingly the isolate inhibited in vitro the growth of a range of possible pathogenic bacteria as well as other Corynebacterium isolates. The majority of Corynebacterium species included in this study did not seem to possess canonical pathogenic activity.

CONCLUSIONS: This study is the first reported genomic and biochemical characterization of ocular Corynebacterium. A number of potential virulence factors were identified which may have direct relevance for ocular health and contribute to the finding of our previous report on the ocular microbiome, where it was shown that DNA libraries were often dominated by members of this genus. Particularly interesting in this regard was the observation that some Corynebacterium, particularly new or novel Corynebacterium sp. can inhibit the growth of other ocular Corynebacterium as well as known pathogens of the eye.

RevDate: 2024-09-28
CmpDate: 2024-09-28

Heuberger M, Bernasconi Z, Said M, et al (2024)

Analysis of a global wheat panel reveals a highly diverse introgression landscape and provides evidence for inter-homoeologue chromosomal recombination.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik, 137(10):236.

This study highlights the agronomic potential of rare introgressions, as demonstrated by a major QTL for powdery mildew resistance on chromosome 7D. It further shows evidence for inter-homoeologue recombination in wheat. Agriculturally important genes are often introgressed into crops from closely related donor species or landraces. The gene pool of hexaploid bread wheat (Triticum aestivum) is known to contain numerous such "alien" introgressions. Recently established high-quality reference genome sequences allow prediction of the size, frequency and identity of introgressed chromosome regions. Here, we characterise chromosomal introgressions in bread wheat using exome capture data from the WHEALBI collection. We identified 24,981 putative introgression segments of at least 2 Mb across 434 wheat accessions. Detailed study of the most frequent introgressions identified T. timopheevii or its close relatives as a frequent donor species. Importantly, 118 introgressions of at least 10 Mb were exclusive to single wheat accessions, revealing that large populations need to be studied to assess the total diversity of the wheat pangenome. In one case, a 14 Mb introgression in chromosome 7D, exclusive to cultivar Pamukale, was shown by QTL mapping to harbour a recessive powdery mildew resistance gene. We identified multiple events where distal chromosomal segments of one subgenome were duplicated in the genome and replaced the homoeologous segment in another subgenome. We propose that these examples are the results of inter-homoeologue recombination. Our study produced an extensive catalogue of the wheat introgression landscape, providing a resource for wheat breeding. Of note, the finding that the wheat gene pool contains numerous rare, but potentially important introgressions and chromosomal rearrangements has implications for future breeding.

RevDate: 2024-09-28

da Silva MERJ, Breyer GM, da Costa MM, et al (2024)

Genomic Analyses of Methicillin-Susceptible and Methicillin-Resistant Staphylococcus pseudintermedius Strains Involved in Canine Infections: A Comprehensive Genotypic Characterization.

Pathogens (Basel, Switzerland), 13(9): pii:pathogens13090760.

Staphylococcus pseudintermedius is frequently associated with several bacterial infections in dogs, highlighting a One Health concern due to the zoonotic potential. Given the clinical significance of this pathogen, we performed comprehensive genomic analyses of 28 S. pseudintermedius strains isolated from canine infections throughout whole-genome sequencing using Illumina HiSeq, and compared the genetic features between S. pseudintermedius methicillin-resistant (MRSP) and methicillin-susceptible (MSSP) strains. Our analyses determined that MRSP genomes are larger than MSSP strains, with significant changes in antimicrobial resistance genes and virulent markers, suggesting differences in the pathogenicity of MRSP and MSSP strains. In addition, the pangenome analysis of S. pseudintermedius from canine and human origins identified core and accessory genomes with 1847 and 3037 genes, respectively, which indicates that most of the S. pseudintermedius genome is highly variable. Furthermore, phylogenomic analysis clearly separated MRSP from MSSP strains, despite their infection sites, showing phylogenetic differences according to methicillin susceptibility. Altogether our findings underscore the importance of studying the evolutionary dynamics of S. pseudintermedius, which is crucial for the development of effective prevention and control strategies of resistant S. pseudintermedius infections.

RevDate: 2024-09-28

García-Rivera C, Molina-Pardines C, Haro-Moreno JM, et al (2024)

Genomic Analysis of Antimicrobial Resistance in Pseudomonas aeruginosa from a "One Health" Perspective.

Microorganisms, 12(9): pii:microorganisms12091770.

The "One Health" approach provides a comprehensive framework for understanding antimicrobial resistance. This perspective is of particular importance in the study of Pseudomonas aeruginosa, as it is not only a pathogen that affects humans but also persists in environmental reservoirs. To assess evolutionary selection for niche-specific traits, a genomic comparison of 749 P. aeruginosa strains from three environments (clinical, aquatic, and soil) was performed. The results showed that the environment does indeed exert selective pressure on specific traits. The high percentage of persistent genome, the lack of correlation between phylogeny and origin of the isolate, and the high intrinsic resistance indicate that the species has a high potential for pathogenicity and resistance, regardless of the reservoir. The flexible genome showed an enrichment of metal resistance genes, which could act as a co-selection of antibiotic resistance genes. In the plasmids, resistance genes were found in multigenic clusters, with the presence of a mobile integron being prominent. This integron was identified in several pathogenic strains belonging to distantly related taxa with a worldwide distribution, showing the risk of rapid evolution of resistance. These results provide a more complete understanding of the evolution of P. aeruginosa, which could help develop new prevention strategies.

RevDate: 2024-09-28

Hua L, Ye P, Li X, et al (2024)

Anti-Aflatoxigenic Burkholderia contaminans BC11-1 Exhibits Mycotoxin Detoxification, Phosphate Solubilization, and Cytokinin Production.

Microorganisms, 12(9): pii:microorganisms12091754.

The productivity and quality of agricultural crops worldwide are adversely affected by disease outbreaks and inadequate nutrient availability. Of particular concern is the potential increase in mycotoxin prevalence due to crop diseases, which poses a threat to food security. Microorganisms with multiple functions have been favored in sustainable agriculture to address such challenges. Aspergillus flavus is a prevalent aflatoxin B1 (AFB1)-producing fungus in China. Therefore, we wanted to obtain an anti-aflatoxigenic bacterium with potent mycotoxin detoxification ability and other beneficial properties. In the present study, we have isolated an anti-aflatoxigenic strain, BC11-1, of Burkholderia contaminans, from a forest rhizosphere soil sample obtained in Luzhou, Sichuan Province, China. We found that it possesses several beneficial properties, as follows: (1) a broad spectrum of antifungal activity but compatibility with Trichoderma species, which are themselves used as biocontrol agents, making it possible to use in a biocontrol mixture or individually with other biocontrol agents in an integrated management approach; (2) an exhibited mycotoxin detoxification capacity with a degradation ratio of 90% for aflatoxin B1 and 78% for zearalenone, suggesting its potential for remedial application; and (3) a high ability to solubilize phosphorus and produce cytokinin production, highlighting its potential as a biofertilizer. Overall, the diverse properties of BC11-1 render it a beneficial bacterium with excellent potential for use in plant disease protection and mycotoxin prevention and as a biofertilizer. Lastly, a pan-genomic analysis suggests that BC11-1 may possess other undiscovered biological properties, prompting further exploration of the properties of this unique strain of B. contaminans. These findings highlight the potential of using the anti-aflatoxigenic strain BC11-1 to enhance disease protection and improve soil fertility, thus contributing to food security. Given its multiple beneficial properties, BC11-1 represents a valuable microbial resource as a biocontrol agent and biofertilizer.

RevDate: 2024-09-28
CmpDate: 2024-09-28

Cai K, Song X, Yue W, et al (2024)

Identification and Functional Characterization of Abiotic Stress Tolerance-Related PLATZ Transcription Factor Family in Barley (Hordeum vulgare L.).

International journal of molecular sciences, 25(18): pii:ijms251810191.

Plant AT-rich sequence and zinc-binding proteins (PLATZs) are a novel category of plant-specific transcription factors involved in growth, development, and abiotic stress responses. However, the PLATZ gene family has not been identified in barley. In this study, a total of 11 HvPLATZs were identified in barley, and they were unevenly distributed on five of the seven chromosomes. The phylogenetic tree, incorporating PLATZs from Arabidopsis, rice, maize, wheat, and barley, could be classified into six clusters, in which HvPLATZs are absent in Cluster VI. HvPLATZs exhibited conserved motif arrangements with a characteristic PLATZ domain. Two segmental duplication events were observed among HvPLATZs. All HvPLATZs were core genes present in 20 genotypes of the barley pan-genome. The HvPLATZ5 coding sequences were conserved among 20 barley genotypes, whereas HvPLATZ4/9/10 exhibited synonymous single nucleotide polymorphisms (SNPs); the remaining ones showed nonsynonymous variations. The expression of HvPLATZ2/3/8 was ubiquitous in various tissues, whereas HvPLATZ7 appeared transcriptionally silent; the remaining genes displayed tissue-specific expression. The expression of HvPLATZs was modulated by salt stress, potassium deficiency, and osmotic stress, with response patterns being time-, tissue-, and stress type-dependent. The heterologous expression of HvPLATZ3/5/6/8/9/10/11 in yeast enhanced tolerance to salt and osmotic stress, whereas the expression of HvPLATZ2 compromised tolerance. These results advance our comprehension and facilitate further functional characterization of HvPLATZs.

RevDate: 2024-09-28
CmpDate: 2024-09-28

Bouras N, Bakli M, Dif G, et al (2024)

The Phylogenomic Characterization of Planotetraspora Species and Their Cellulases for Biotechnological Applications.

Genes, 15(9): pii:genes15091202.

This study aims to evaluate the in silico genomic characteristics of five species of the genus Planotetraspora: P. kaengkrachanensis, P. mira, P. phitsanulokensis, P. silvatica, and P. thailandica, with a view to their application in therapeutic research. The 16S rRNA comparison indicated that these species were phylogenetically distinct. Pairwise comparisons of digital DNA-DNA hybridization (dDDH) and OrthoANI values between these studied type strains indicated that dDDH values were below 62.5%, while OrthoANI values were lower than 95.3%, suggesting that the five species represent distinct genomospecies. These results were consistent with the phylogenomic study based on core genes and the pangenome analysis of these five species within the genus Planotetraspora. However, the genome annotation showed some differences between these species, such as variations in the number of subsystem category distributions across whole genomes (ranging between 1979 and 2024). Additionally, the number of CAZYme (Carbohydrate-Active enZYme) genes ranged between 298 and 325, highlighting the potential of these bacteria for therapeutic research applications. The in silico physico-chemical characteristics of cellulases from Planotetraspora species were analyzed. Their 3D structure was modeled, refined, and validated. A molecular docking analysis of this cellulase protein structural model was conducted with cellobiose, cellotetraose, laminaribiose, carboxymethyl cellulose, glucose, and xylose ligand. Our study revealed significant interaction between the Planotetraspora cellulase and cellotetraose substrate, evidenced by stable binding energies. This suggests that this bacterial enzyme holds great potential for utilizing cellotetraose as a substrate in various applications. This study enriches our understanding of the potential applications of Planotetraspora species in therapeutic research.

RevDate: 2024-09-27
CmpDate: 2024-09-28

Stocke K, Lamont G, Tan J, et al (2024)

Delineation of global, absolutely essential and conditionally essential pangenomes of Porphyromonas gingivalis.

Scientific reports, 14(1):22247.

Porphyromonas gingivalis is a Gram-negative, anaerobic oral pathobiont, an etiological agent of periodontitis and the most commonly studied periodontal bacterium. Multiple low passage clinical isolates were sequenced, and their genomes compared to several laboratory strains. Phylogenetic distances were mapped, a gene absence-presence matrix generated, and core (present in all genomes) and accessory (absent in one or more genomes) genes delineated. Subsequently, a second pangenome delineating the prevalence of inherently essential genes was generated. The prevalence of genes conditionally essential for surviving tobacco exposure, abscess formation and epithelial invasion was also determined, in addition to genes encoding key proteolytic enzymes containing putative signal peptides. While the absolutely essential pangenome was highly conserved, significant differences in the complete and conditionally essential pangenomes were apparent. Thus, genetic plasticity appears to lie primarily in gene sets facilitating adaptation to variant disease-related environments. Those genes that are highly pervasive in the P. gingivalis absolutely essential pangenome or are highly prevalent and essential for fitness in disease-relevant models, may represent particularly attractive therapeutic targets worthy of further investigation. As mutations in absolutely essential genes are expected to be lethal, the data provided herein should also facilitate improved planning for P. gingivalis gene mutation strategies.

RevDate: 2024-09-27

Uzzal Hossain M, Khan Tanvir N, Naimur Rahman ABZ, et al (2024)

From sequence to Significance: A thorough investigation of the distinctive genome features Uncovered in C. Werkmanii strain NIB003.

Gene pii:S0378-1119(24)00846-1 [Epub ahead of print].

Citrobacter werkmanii (C. werkmanii), an opportunistic urinary bacterium that causes diarrhea, is poorly understood. Our research focuses on genetic features that are crucial to disease development, such as pathogenic interactions, antibiotic resistance, virulence genes and genetic variation. Following its morphological, biochemical, and molecular identification, the whole genome of C. werkmanii strain NIB003 was sequenced in Bangladesh for the first time. Despite having around 80% whole genome conservation, the research shows that the Bangladeshi strain forms a separate phylogenetic cluster. This emphasises the genetic variability within C. werkmanii, resulting in particular modifications at the strain level and changes in its ability to cause disease. The results of the genetic diversity analysis indicate that the Bangladeshi sequenced genome is more diverse than the other strains due to the existence of unique features, such as the presence of t-RNA binding domain and N-6 adenine-specific DNA methylases.

RevDate: 2024-09-27

Nawrocki EM, Kudva IT, EG Dudley (2024)

Investigating the adherence factors of Escherichia coli at the bovine recto-anal junction.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Shiga toxin-producing Escherichia coli (STEC) are major foodborne pathogens that result in thousands of hospitalizations each year in the United States. Cattle, the natural reservoir, harbor STEC asymptomatically at the recto-anal junction (RAJ). The molecular mechanisms that allow STEC and non-STEC E. coli to adhere to the RAJ are not fully understood, in part because most adherence studies utilize human cell culture models. To identify a set of bovine-specific E. coli adherence factors, we used the primary RAJ squamous epithelial (RSE) cell-adherence assay to coculture RSE cells from healthy Holstein cattle with diverse E. coli strains from bovine and nonbovine sources. We hypothesized that a comparative genomic analysis of the strains would reveal factors associated with RSE adherence. After performing adherence assays with historical strains from the E. coli Reference Center (n = 62) and strains newly isolated from the RAJ (n = 15), we used the bioinformatic tool Roary to create a pangenome of this collection. We classified strains as either low or high adherence and using the Scoary program compiled a list of accessory genes correlated with the "high adherence" strains. While none of the correlations were statistically significant, several gene clusters were associated with the high-adherence phenotype, including two that encode uncharacterized proteins. We also demonstrated that non-STEC E. coli strains from the RAJ are more adherent than other isolates and can outcompete STEC in coculture with RSEs. Further analysis of adherence-associated gene clusters may lead to an improved understanding of the molecular mechanisms of RSE adherence and may help develop probiotics targeting STEC in cattle.

IMPORTANCE: E. coli strains that produce Shiga toxin cause foodborne illness in humans but colonize cattle asymptomatically. The molecular mechanisms that E. coli uses to adhere to cattle cells are largely unknown. Various strategies are used to control E. coli in livestock and limit the risk of outbreaks. These include vaccinating animals against common E. coli strains and supplementing their feed with probiotics to reduce the carriage of pathogens. No strategy is completely effective, and probiotics often fail to colonize the animals. We sought to clarify the genes required for E. coli adherence in cattle by quantifying the attachment to bovine cells in a diverse set of bacteria. We also isolated nonpathogenic E. coli from healthy cows and showed that a representative isolate could outcompete pathogenic strains in cocultures. We propose that the focused study of these strains and their adherence factors will better inform the design of probiotics and vaccines for livestock.

RevDate: 2024-09-27

Ong CT, Blackall PJ, Boe-Hansen GB, et al (2024)

Whole-genome comparison using complete genomes from Campylobacter fetus strains revealed single nucleotide polymorphisms on non-genomic islands for subspecies differentiation.

Frontiers in microbiology, 15:1452564.

INTRODUCTION: Bovine Genital Campylobacteriosis (BGC), caused by Campylobacter fetus subsp. venerealis, is a sexually transmitted bacterium that significantly impacts cattle reproductive performance. However, current detection methods lack consistency and reliability due to the close genetic similarity between C. fetus subsp. venerealis and C. fetus subsp. fetus. Therefore, this study aimed to utilize complete genome analysis to distinguish genetic features between C. fetus subsp. venerealis and other subspecies, thereby enhancing BGC detection for routine screening and epidemiological studies.

METHODS AND RESULTS: This study reported the complete genomes of four C. fetus subsp. fetus and five C. fetus subsp. venerealis, sequenced using long-read sequencing technologies. Comparative whole-genome analyses (n = 25) were conducted, incorporating an additional 16 complete C. fetus genomes from the NCBI database, to investigate the genomic differences between these two closely related C. fetus subspecies. Pan-genomic analyses revealed a core genome consisting of 1,561 genes and an accessory pangenome of 1,064 genes between the two C. fetus subspecies. However, no unique predicted genes were identified in either subspecies. Nonetheless, whole-genome single nucleotide polymorphisms (SNPs) analysis identified 289 SNPs unique to one or the C. fetus subspecies. After the removal of SNPs located on putative genomic islands, recombination sites, and those causing synonymous amino acid changes, the remaining 184 SNPs were functionally annotated. Candidate SNPs that were annotated with the KEGG "Peptidoglycan Biosynthesis" pathway were recruited for further analysis due to their potential association with the glycine intolerance characteristic of C. fetus subsp. venerealis and its biovar variant. Verification with 58 annotated C. fetus genomes, both complete and incomplete, from RefSeq, successfully classified these seven SNPs into two groups, aligning with their phenotypic identification as CFF (Campylobacter fetus subsp. fetus) or CFV/CFVi (Campylobacter fetus subsp. venerealis and its biovar variant). Furthermore, we demonstrated the application of mraY SNPs for detecting C. fetus subspecies using a quantitative PCR assay.

DISCUSSION: Our results highlighted the high genetic stability of C. fetus subspecies. Nevertheless, Campylobacter fetus subsp. venerealis and its biovar variants encoded common SNPs in genes related to glycine intolerance, which differentiates them from C. fetus subsp. fetus. This discovery highlights the potential of employing a multiple-SNP assay for the precise differentiation of C. fetus subspecies.

RevDate: 2024-09-26

Guo M, Bi G, Wang H, et al (2024)

Genomes of autotetraploid wild and cultivated Ziziphus mauritiana reveal polyploid evolution and crop domestication.

Plant physiology pii:7777155 [Epub ahead of print].

Indian jujube (Ziziphus mauritiana) holds a prominent position in the global fruit and pharmaceutical markets. Here, we report the assemblies of haplotype-resolved, telomere-to-telomere genomes of autotetraploid wild and cultivated Indian jujube plants using a two-stage assembly strategy. The generation of these genomes permitted in-depth investigations into the divergence and evolutionary history of this important fruit crop. Using a graph-based pan-genome constructed from eight monoploid genomes, we identified structural variation (SV)-FST hotspots and SV hotspots. Gap-free genomes provide a means to obtain a global view of centromere structures. We identified presence-absence variation-related genes in four monoploid genomes (cI, cIII, wI, and wIII) and resequencing populations. We also present the population structure and domestication trajectory of the Indian jujube based on the resequencing of 73 wild and cultivated accessions. Metabolomic and transcriptomic analyses of mature fruits of wild and cultivated accessions unveiled the genetic basis underlying loss of fruit astringency during domestication of Indian jujube. This study reveals mechanisms underlying the divergence, evolution, and domestication of the autotetraploid Indian jujube and provides rich and reliable genetic resources for future research.

RevDate: 2024-09-25

Narechania A, Bobo D, Deitz K, et al (2024)

Rapid SARS-COV2 surveillance using clinical, pooled, or wastewater sequence as a sensor for population change.

Genome research pii:gr.278594.123 [Epub ahead of print].

The COVID-19 pandemic has highlighted the critical role of genomic surveillance for guiding policy and control. Timeliness is key, but sequence alignment and phylogeny slows most surveillance techniques. Millions of SARS-CoV-2 genomes have been assembled. Phylogenetic methods are ill equipped to handle this sheer scale. We introduce a pangenomic measure that examines the information diversity of a k-mer library drawn from a country's complete set of clinical, pooled, or wastewater sequence. Quantifying diversity is central to ecology. Hill numbers, or the effective number of species in a sample, provide a simple metric for comparing species diversity across environments. The more diverse the sample, the higher the Hill number. We adopt this ecological approach and consider each k-mer an individual and each genome a transect in the pangenome of the species. Structured in this way, Hill numbers summarize the temporal trajectory of pandemic variants, collapsing each day's assemblies into genome equivalents. For pooled or wastewater sequence, we instead compare days using survey sequence divorced from individual infections. Across data from the UK, USA, and South Africa, we trace the ascendance of new variants of concern as they emerge in local populations well before these variants are named and added to phylogenetic databases. Using data from San Diego wastewater, we monitor these same population changes from raw, unassembled sequence. This history of emerging variants senses all available data as it is sequenced, intimating variant sweeps to dominance or declines to extinction at the leading edge of the COVID19 pandemic.

RevDate: 2024-09-25

Fornezza S, Delvecchio VS, Harvey WT, et al (2024)

AGAP duplicons associate with structural diversity at Chromosome 10q11.22.

Genome research pii:gr.279454.124 [Epub ahead of print].

The 10q11.22 chromosomal region is a duplication-rich interval of the human genome and one of the last to be fully assembled. It carries copy-number variable genes associated with intellectual disability, bipolar disorder, and obesity. In this study, we characterized the structural diversity at this locus by analyzing 64 haploid assemblies produced by the Human Pangenome Reference Consortium. We identified eleven alternative haplotypes that differ in the copy number and/or orientation of large genomic segments, ranging from hundreds of kilobase pairs (kbp) to over one megabase pair (Mbp). We uncovered a 2.4 Mbp size difference between the shortest and longest haplotypes. Breakpoint analysis revealed that genomic instability results from nonallelic homologous recombination between segmental duplication (SD) pairs with varying similarity (94.4-99.6%). Nonetheless, these pairs generally recombine at positions where their identity is higher (>99.6%). Recurrent inversions occur with varying breakpoints within the same inverted SD pair. Inversion polymorphisms shuffle the entire SD arrangement, creating new predispositions to copy-number variations. The SD architecture is associated with a catarrhine-specific subgroup of the AGAP gene family, which likely triggered the accumulation of SDs at this locus over the past 25 million years of human evolution. Our results reveal extensive structural diversity and genomic instability at the 10q11.22 locus and expand the general understanding of the mutational mechanisms behind SD-mediated rearrangements.

RevDate: 2024-09-25
CmpDate: 2024-09-25

Chen L, Zhang L, Li Y, et al (2024)

Screening of promising molecules against potential drug targets in Yersinia pestis by integrative pan and subtractive genomics, docking and simulation approach.

Archives of microbiology, 206(10):415.

This study focuses on Yersinia pestis, the bacterium responsible for plague, which posed a severe threat to public health in history. Despite the availability of antibiotics treatment, the emergence of antibiotic resistance in this pathogen has increased challenges of controlling the infections and plague outbreaks. The development of new drug targets and therapies is urgently needed. This research aims to identify novel protein targets from 28 Y. pestis strains by the integrative pan-genomic and subtractive genomics approach. Additionally, it seeks to screen out potential safe and effective alternative therapies against these targets via high-throughput virtual screening. Targets should lack homology to human, gut microbiota, and known human 'anti-targets', while should exhibit essentiality for pathogen's survival and virulence, druggability, antibiotic resistance, and broad spectrum across multiple pathogenic bacteria. We identified two promising targets: the aminotransferase class I/class II domain-containing protein and 3-oxoacyl-[acyl-carrier-protein] synthase 2. These proteins were modeled using AlphaFold2, validated through several structural analyses, and were subjected to molecular docking and ADMET analysis. Molecular dynamics simulations determined the stability of the ligand-target complexes, providing potential therapeutic options against Y. pestis.

RevDate: 2024-09-25

Cunha F, Zhai Y, Casaro S, et al (2024)

Pangenomic and biochemical analyses of Helcococcus ovis reveal widespread tetracycline resistance and a novel bacterial species, Helcococcus bovis.

Frontiers in microbiology, 15:1456569.

Helcococcus ovis (H. ovis) is an opportunistic bacterial pathogen of a wide range of animal hosts including domestic ruminants, swine, avians, and humans. In this study, we sequenced the genomes of 35 Helcococcus sp. clinical isolates from the uterus of dairy cows and explored their antimicrobial resistance and biochemical phenotypes in vitro. Phylogenetic and average nucleotide identity analyses classified four Helcococcus isolates within a cryptic clade representing an undescribed species, for which we propose the name Helcococcus bovis sp. nov. By establishing this new species clade, we also resolve the longstanding question of the classification of the Tongji strain responsible for a confirmed human conjunctival infection. This strain did not neatly fit into H. ovis and is instead a member of H. bovis. We applied whole genome comparative analyses to explore the pangenome, resistome, virulome, and taxonomic diversity of the remaining 31 H. ovis isolates. An overwhelming 97% of H. ovis strains (30 out of 31) harbor mobile tetracycline resistance genes and displayed significantly increased minimum inhibitory concentrations of tetracyclines in vitro. The high prevalence of mobile tetracycline resistance genes makes H. ovis a significant antimicrobial resistance gene reservoir in our food chain. Finally, the phylogenetic distribution of co-occurring high-virulence determinant genes of H. ovis across unlinked and distant loci highlights an instance of convergent gene loss in the species. In summary, this study showed that mobile genetic element-mediated tetracycline resistance is widespread in H. ovis, and that there is evidence of co-occurring virulence factors across clades suggesting convergent gene loss in the species. Finally, we introduced a novel Helcococcus species closely related to H. ovis, called H. bovis sp. nov., which has been reported to cause infection in humans.

RevDate: 2024-09-24

Zheng B, Xu J, Zhang Y, et al (2024)

MBCN: A novel reference database for Effcient Metagenomic analysis of human gut microbiome.

Heliyon, 10(18):e37422.

Metagenomic shotgun sequencing data can identify microbes and their proportions. But metagenomic shotgun data profiling results obtained from multiple projects using different reference databases are difficult to compare and apply meta-analysis. Our work aims to create a novel collection of human gut prokaryotic genomes, named Microbiome Collection Navigator (MBCN). 2379 human gut metagenomic samples are screened, and 16,785 metagenome-assembled genomes (MAGs) are assembled using a standardized pipeline. In addition, MAGs are combined with the representative genomes from public prokaryotic genomes collections to cluster, and pan-genomes for each cluster's genomes are constructed to build Kraken2 and Bracken databases. The databases built by MBCN are more comprehensive and accurate for profiling metagenomic reads comparing with other collections on simulated reads and virtual bio-projects. We profile 1082 human gut metagenomic samples with MBCN database and organize profiles and metadata on the web program. Meanwhile, using MBCN as a reference database, we also develop a unified, standardized, and systematic metagenomic analysis pipeline and platform, named MicrobiotaCN (http://www.microbiota.cn) and common statistical and visualization tools for microbiome research are integrated into the web program. Taken together, MBCN and MicrobiotaCN can be a valuable resource and a powerful tool that allows researchers to perform metagenomic analysis by a unified pipeline efficiently.

RevDate: 2024-09-23

Liu JN, Yan L, Chai Z, et al (2024)

Pan-genome analyses of eleven Fraxinus species provide insights into salt adaptation in ash trees.

Plant communications pii:S2590-3462(24)00533-9 [Epub ahead of print].

Ash trees (Fraxinus) exhibit rich genetic diversity and wide adaptation to various ecological environments, several of which are highly salt-tolerant. Dissecting the genomic basis underlying ash tree salt adaptation is vital for its resistance breeding. Here, we presented eleven high-quality chromosome-level genome assemblies for Fraxinus species, revealing two unequal sub-genome compositions and two more recent whole-genome triplication events in evolutionary history. A Fraxinus structural variation-based pan-genome was constructed and revealed that presence-absence variations (PAVs) of transmembrane transport genes likely contribute to Fraxinus salt adaptation. Through whole-genome resequencing of an inter-species cross F1-population of F. velutina 'Lula 3' (salt-tolerant) × F. pennsylvanica 'Lula 5' (salt-sensitive), we performed a salt tolerance PAV-based quantitative trait loci (QTL) mapping and pinpointed two PAV-QTLs and candidate genes associated with Fraxinus salt tolerance. Mechanismly, FvbHLH85 enhanced salt tolerance by mediating reactive oxygen species and Na[+]/K[+] homeostasis, while FvSWEET5 by mediating osmotic homeostasis. Collectively, these findings provide valuable genomic resources for Fraxinus salt resistance breeding and research community.

RevDate: 2024-09-19
CmpDate: 2024-09-19

Sarwal V, Lee S, Yang J, et al (2024)

VISTA: an integrated framework for structural variant discovery.

Briefings in bioinformatics, 25(5):.

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.

RevDate: 2024-09-21

Gaye A, Sene ARG, Gadji M, et al (2024)

Toward building a comprehensive human pan-genome: The SEN-GENOME project.

American journal of human genetics pii:S0002-9297(24)00303-3 [Epub ahead of print].

The human reference genome (GRCh38), primarily sourced from individuals of European descent, falls short in capturing the vast genetic diversity across global populations. Efforts to diversify the reference genome face challenges in accessibility and representation, exacerbating the scarcity of African genomic data crucial for studying diseases prevalent in these populations. Sherman et al. proposed constructing reference genomes tailored to distinct human sub-populations. Their African Pan-Genome initiative highlighted substantial genetic variation missing from the GRCh38 human reference genome, emphasizing the necessity for population-specific genomes. In response, local initiatives like the Senegalese Genome project (SEN-GENOME) have emerged to document the genomes of historically overlooked populations. SEN-GENOME embodies community-driven decentralized research. With meticulous recruitment criteria and ethical practices, it aims to sequence 1,000 genomes from 31 ethnolinguistic groups, in the fourteen administrative regions of Senegal, fostering local genomic research tailored to the region. The key to SEN-GENOME's success is its commitment to local governance of data, capacity building, and integration with broader pan-genome projects in Africa. Despite the complexities of data harmonization and sharing, our collaborative efforts are aligned with common goals, ensuring steady progress toward a comprehensive human pan-genome. We invite and welcome collaboration with other research entities to achieve this shared vision. In summary, local initiatives such as SEN-GENOME are pivotal in bridging genomic disparities, offering pathways to equitable and inclusive genomic research. Collaborative endeavors guided by a collective vision for human health will propel us toward a more encompassing understanding of the human genome and better health through genomic medicine.

RevDate: 2024-09-20
CmpDate: 2024-09-21

Silva MH, Batista LL, Malta SM, et al (2024)

Unveiling the Brazilian kefir microbiome: discovery of a novel Lactobacillus kefiranofaciens (LkefirU) genome and in silico prospection of bioactive peptides with potential anti-Alzheimer properties.

BMC genomics, 25(1):884.

BACKGROUND: Kefir is a complex microbial community that plays a critical role in the fermentation and production of bioactive peptides, and has health-improving properties. The composition of kefir can vary by geographic localization and weather, and this paper focuses on a Brazilian sample and continues previous work that has successful anti-Alzheimer properties. In this study, we employed shotgun metagenomics and peptidomics approaches to characterize Brazilian kefir further.

RESULTS: We successfully assembled the novel genome of Lactobacillus kefiranofaciens (LkefirU) and conducted a comprehensive pangenome analysis to compare it with other strains. Furthermore, we performed a peptidome analysis, revealing the presence of bioactive peptides encrypted by L. kefiranofaciens in the Brazilian kefir sample, and utilized in silico prospecting and molecular docking techniques to identify potential anti-Alzheimer peptides, targeting β-amyloid (fibril and plaque), BACE, and acetylcholinesterase. Through this analysis, we identified two peptides that show promise as compounds with anti-Alzheimer properties.

CONCLUSIONS: These findings not only provide insights into the genome of L. kefiranofaciens but also serve as a promising prototype for the development of novel anti-Alzheimer compounds derived from Brazilian kefir.

RevDate: 2024-09-20
CmpDate: 2024-09-20

Martineau M, Ambroset C, Lefebvre S, et al (2024)

Unravelling the main genomic features of Mycoplasma equirhinis.

BMC genomics, 25(1):886.

BACKGROUND: Mycoplasma spp. are wall-less bacteria with small genomes (usually 0.5-1.5 Mb). Many Mycoplasma (M.) species are known to colonize the respiratory tract of both humans and livestock animals, where they act as primary pathogens or opportunists. M. equirhinis was described for the first time in 1975 in horses but has been poorly studied since, despite regular reports of around 14% prevalence in equine respiratory disorders. We recently showed that M. equirhinis is not a primary pathogen but could play a role in co-infections of the respiratory tract. This study was a set up to propose the first genomic characterization to better our understanding of the M. equirhinis species.

RESULTS: Four circularized genomes, two of which were generated here, were compared in terms of synteny, gene content, and specific features associated with virulence or genome plasticity. An additional 20 scaffold-level genomes were used to analyse intra-species diversity through a pangenome phylogenetic approach. The M. equirhinis species showed consistent genomic homogeneity, pointing to potential clonality of isolates despite their varied geographical origins (UK, Japan and various places in France). Three different classes of mobile genetic elements have been detected: insertion sequences related to the IS1634 family, a putative prophage related to M. arthritidis and integrative conjugative elements related to M. arginini. The core genome harbours the typical putative virulence-associated genes of mycoplasmas mainly involved in cytoadherence and immune escape.

CONCLUSION: M. equirhinis is a highly syntenic, homogeneous species with a limited repertoire of mobile genetic elements and putative virulence genes.

RevDate: 2024-09-20
CmpDate: 2024-09-20

Chandra T, Jaiswal S, Tomar RS, et al (2024)

Realizing visionary goals for the International Year of Millet (IYoM): accelerating interventions through advances in molecular breeding and multiomics resources.

Planta, 260(4):103.

Leveraging advanced breeding and multi-omics resources is vital to position millet as an essential "nutricereal resource," aligning with IYoM goals, alleviating strain on global cereal production, boosting resilience to climate change, and advancing sustainable crop improvement and biodiversity. The global challenges of food security, nutrition, climate change, and agrarian sustainability demand the adoption of climate-resilient, nutrient-rich crops to support a growing population amidst shifting environmental conditions. Millets, also referred to as "Shree Anna," emerge as a promising solution to address these issues by bolstering food production, improving nutrient security, and fostering biodiversity conservation. Their resilience to harsh environments, nutritional density, cultural significance, and potential to enhance dietary quality index made them valuable assets in global agriculture. Recognizing their pivotal role, the United Nations designated 2023 as the "International Year of Millets (IYoM 2023)," emphasizing their contribution to climate-resilient agriculture and nutritional enhancement. Scientific progress has invigorated efforts to enhance millet production through genetic and genomic interventions, yielding a wealth of advanced molecular breeding technologies and multi-omics resources. These advancements offer opportunities to tackle prevailing challenges in millet, such as anti-nutritional factors, sensory acceptability issues, toxin contamination, and ancillary crop improvements. This review provides a comprehensive overview of molecular breeding and multi-omics resources for nine major millet species, focusing on their potential impact within the framework of IYoM. These resources include whole and pan-genome, elucidating adaptive responses to abiotic stressors, organelle-based studies revealing evolutionary resilience, markers linked to desirable traits for efficient breeding, QTL analysis facilitating trait selection, functional gene discovery for biotechnological interventions, regulatory ncRNAs for trait modulation, web-based platforms for stakeholder communication, tissue culture techniques for genetic modification, and integrated omics approaches enabled by precise application of CRISPR/Cas9 technology. Aligning these resources with the seven thematic areas outlined by IYoM catalyzes transformative changes in millet production and utilization, thereby contributing to global food security, sustainable agriculture, and enhanced nutritional consequences.

RevDate: 2024-09-19

Hellewell J, Horsfield ST, von Wachsmann J, et al (2024)

CELEBRIMBOR: Core and accessory genes from metagenomes.

Bioinformatics (Oxford, England) pii:7762100 [Epub ahead of print].

MOTIVATION: Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be mischaracterized by current pangenome approaches.

RESULTS: Here, we present CELEBRIMBOR, a Snakemake pangenome analysis pipeline which uses a measure of genome completeness to automatically adjust the frequency threshold at which core genes are identified, enabling accurate core gene identification in MAGs and SAGs.

AVAILABILITY: CELEBRIMBOR is published under open source Apache 2.0 licence at https://github.com/bacpop/CELEBRIMBOR and is available as a Docker container from this repository. Supplementary material is available in the online version of the article.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2024-09-18
CmpDate: 2024-09-18

Vaibarova V, Kralova S, Palikova M, et al (2024)

Genetic and phenotypic diversity of Flavobacterium psychrophilum isolates from Czech salmonid fish farms.

BMC microbiology, 24(1):352.

BACKGROUND: The salmonid pathogen Flavobacterium psychrophilum poses a significant economic threat to global aquaculture, yet our understanding of its genetic and phenotypic diversity remains incomplete across much of its geographic range. In this study, we characterise the genetic and phenotypic diversity of 70 isolates collected from rainbow trout (Oncorhynchus mykiss) and brown trout (Salmo trutta m. fario) from fish farms in the Czech Republic between 2012 and 2019 to compare their genomic content with all draft or complete genomes present in the NCBI database (n = 187).

RESULTS: The Czech isolates underwent comprehensive evaluation, including multiplex PCR-based serotyping, genetic analysis, antimicrobial resistance testing, and assessment of selected virulence factors. Multiplex PCR serotyping revealed 43 isolates as Type 1, 23 as Type 2, with sporadic cases of Types 3 and 4. Multi-locus sequence typing unveiled 12 sequence types (ST), including seven newly described ones. Notably, 24 isolates were identified as ST329, a novel sequence type, while 22 were classified as the globally-distributed ST2. Phylogenetic analysis demonstrated clonal distribution of ST329 in the Czech Republic, with these isolates lacking a phage sequence in their genomes. Antimicrobial susceptibility testing revealed a high proportion of isolates classified as non-wild type with reduced susceptibility to oxolinic acid, oxytetracycline, flumequine, and enrofloxacin, while most isolates were classified as wild type for florfenicol, sulfamethoxazole-trimethoprim, and erythromycin. However, 31 isolates classified as wild type for florfenicol exhibited minimum inhibitory concentrations at the susceptibility breakpoint.

CONCLUSION: The prevalence of the Czech F. psychrophilum serotypes has evolved over time, likely influenced by the introduction of new isolates through international trade. Thus, it is crucial to monitor F. psychrophilum clones within and across countries using advanced methods such as MLST, serotyping, and genome sequencing. Given the open nature of the pan-genome, further sequencing of strains promises exciting discoveries in F. psychrophilum genomics.

RevDate: 2024-09-18

Góngora E, Lirette AO, Freyria NJ, et al (2024)

Metagenomic survey reveals hydrocarbon biodegradation potential of Canadian high Arctic beaches.

Environmental microbiome, 19(1):72.

BACKGROUND: Decreasing sea ice coverage across the Arctic Ocean due to climate change is expected to increase shipping activity through previously inaccessible shipping routes, including the Northwest Passage (NWP). Changing weather conditions typically encountered in the Arctic will still pose a risk for ships which could lead to an accident and the uncontrolled release of hydrocarbons onto NWP shorelines. We performed a metagenomic survey to characterize the microbial communities of various NWP shorelines and to determine whether there is a metabolic potential for hydrocarbon degradation in these microbiomes.

RESULTS: We observed taxonomic and functional gene evidence supporting the potential of NWP beach microbes to degrade various types of hydrocarbons. The metagenomic and metagenome-assembled genome (MAG) taxonomy showed that known hydrocarbon-degrading taxa are present in these beaches. Additionally, we detected the presence of biomarker genes of aerobic and anaerobic degradation pathways of alkane and aromatic hydrocarbons along with complete degradation pathways for aerobic alkane degradation. Alkane degradation genes were present in all samples and were also more abundant (33.8 ± 34.5 hits per million genes, HPM) than their aromatic hydrocarbon counterparts (11.7 ± 12.3 HPM). Due to the ubiquity of MAGs from the genus Rhodococcus (23.8% of the MAGs), we compared our MAGs with Rhodococcus genomes from NWP isolates obtained using hydrocarbons as the carbon source to corroborate our results and to develop a pangenome of Arctic Rhodococcus. Our analysis revealed that the biodegradation of alkanes is part of the core pangenome of this genus. We also detected nitrogen and sulfur pathways as additional energy sources and electron donors as well as carbon pathways providing alternative carbon sources. These pathways occur in the absence of hydrocarbons allowing microbes to survive in these nutrient-poor beaches.

CONCLUSIONS: Our metagenomic analyses detected the genetic potential for hydrocarbon biodegradation in these NWP shoreline microbiomes. Alkane metabolism was the most prevalent type of hydrocarbon degradation observed in these tidal beach ecosystems. Our results indicate that bioremediation could be used as a cleanup strategy, but the addition of adequate amounts of N and P fertilizers, should be considered to help bacteria overcome the oligotrophic nature of NWP shorelines.

RevDate: 2024-09-17
CmpDate: 2024-09-17

Bouznada K, Saker R, Belaouni HA, et al (2024)

Phylogenomic Analysis Supports the Reclassification of Caldicoprobacter faecalis (Winter et al. 1988) Bouanane-Darenfed et al. (2015) as a Later Heterotypic Synonym of Caldicoprobacter oshimai Yokoyama et al. (2010).

Current microbiology, 81(11):363.

This study employs genome-based methodologies to explore the taxonomic relationship between Caldicoprobacter faecalis DSM 20678[T] and Caldicoprobacter oshimai DSM 21659[T]. The genome-based similarity indices calculations consisting of digital DNA-DNA Hybridization (dDDH), Average Amino Aid Identity (AAI), and Average Nucleotide Identity (ANI) between the genomes of these two type strains yielded percentages of 91.2%, 98.9%, and 99.1%, respectively. These values were above the recommended thresholds of 70% (dDDH) and 95-96% (ANI and AAI) for bacterial species delineation, indicating a shared taxonomic position for C. faecalis and C. oshimai. Furthermore, analysis utilizing the 'Bacterial Pan Genome Analysis' (BPGA) pipeline and constructing a Maximum Likelihood core-genes tree using FastTree2 consistently demonstrated the close relationship between C. faecalis DSM 20678[T] and C. oshimai DSM 21659[T], evident from their clustering in the core-genes phylogenomic tree. Based on these comprehensive findings, we propose the reclassification of C. faecalis as a later heterotypic synonym of C. oshimai.

RevDate: 2024-09-17

Bucher-Johannessen C, Senthakumaran T, Avershina E, et al (2024)

Species-level verification of Phascolarctobacterium association with colorectal cancer.

mSystems [Epub ahead of print].

We have previously demonstrated an association between increased abundance of Phascolarctobacterium and colorectal cancer (CRC) and adenomas in two independent Norwegian cohorts. Here we seek to verify our previous findings using new cohorts and methods. In addition, we characterize lifestyle and sex specificity, the functional potential of the Phascolarctobacterium species, and their interaction with other microbial species. We analyze Phascolarctobacterium with 16S rRNA sequencing, shotgun metagenome sequencing, and species-specific qPCR, using 2350 samples from three Norwegian cohorts-CRCAhus, NORCCAP, and CRCbiome-and a large publicly available data set, curatedMetagenomicData. Using metagenome-assembled genomes from the CRCbiome study, we explore the genomic characteristics and functional potential of the Phascolarctobacterium pangenome. Three species of Phascolarctobacterium associated with adenoma/CRC were consistently detected by qPCR and sequencing. Positive associations with adenomas/CRC were verified for Phascolarctobacterium succinatutens and negative associations were shown for Phascolarctobacterium faecium and adenoma in curatedMetagenomicData. Men show a higher prevalence of P. succinatutens across cohorts. Co-occurrence among Phascolarctobacterium species was low (<6%). Each of the three species shows distinct microbial composition and forms distinct correlation networks with other bacterial taxa, although Dialister invisus was negatively correlated to all investigated Phascolarctobacterium species. Pangenome analyses showed P. succinatutens to be enriched for genes related to porphyrin metabolism and degradation of complex carbohydrates, whereas glycoside hydrolase enzyme 3 was specific to P. faecium.IMPORTANCEUntil now Phascolarctobacterium has been going under the radar as a CRC-associated genus despite having been noted, but overseen, as such for over a decade. We found not just one, but two species of Phascolarctobacterium to be associated with CRC-Phascolarctobacterium succinatutens was more abundant in adenoma/CRC, while Phascolarctobacterium faecium was less abundant in adenoma. Each of them represents distinct communities, constituted by specific microbial partners and metabolic capacities-and they rarely occur together in the same patients. We have verified that P. succinatutens is increased in adenoma and CRC and this species should be recognized among the most important CRC-associated bacteria.

RevDate: 2024-09-16
CmpDate: 2024-09-16

Geethanjali S, Kadirvel P, S Periyannan (2024)

Wheat improvement through advances in single nucleotide polymorphism (SNP) detection and genotyping with a special emphasis on rust resistance.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik, 137(10):224.

Single nucleotide polymorphism (SNP) markers in wheat and their prospects in breeding with special reference to rust resistance. Single nucleotide polymorphism (SNP)-based markers are increasingly gaining momentum for screening and utilizing vital agronomic traits in wheat. To date, more than 260 million SNPs have been detected in modern cultivars and landraces of wheat. This rapid SNP discovery was made possible through the release of near-complete reference and pan-genome assemblies of wheat and its wild relatives, coupled with whole genome sequencing (WGS) of thousands of wheat accessions. Further, genotyping customized SNP sites were facilitated by a series of arrays (9 to 820Ks), a cost effective substitute WGS. Lately, germplasm-specific SNP arrays have been introduced to characterize novel traits and detect closely linked SNPs for marker-assisted breeding. Subsequently, the kompetitive allele-specific PCR (KASP) assay was introduced for rapid and large-scale screening of specific SNP markers. Moreover, with the advances and reduction in sequencing costs, ample opportunities arise for generating SNPs artificially through mutations and in combination with next-generation sequencing and comparative genomic analyses. In this review, we provide historical developments and prospects of SNP markers in wheat breeding with special reference to rust resistance where over 50 genetic loci have been characterized through SNP markers. Rust resistance is one of the most essential traits for wheat breeding as new strains of the Puccinia fungus, responsible for rust diseases, evolve frequently and globally.

RevDate: 2024-09-16

Olivos-Caicedo KY, Fernandez F, Daniel SL, et al (2024)

Pangenome analysis of Clostridium scindens : a collection of diverse bile acid and steroid metabolizing commensal gut bacterial strains.

bioRxiv : the preprint server for biology pii:2024.09.06.610859.

Clostridium scindens is a commensal gut bacterium capable of forming the secondary bile acids deoxycholic acid and lithocholic acid from the primary bile acids cholic acid and chenodeoxycholic acid, respectively, as well as converting glucocorticoids to androgens. Historically, only two strains, C. scindens ATCC 35704 and C. scindens VPI 12708, have been characterized in vitro and in vivo to any significant extent. The formation of secondary bile acids is important in maintaining normal gastrointestinal function, in regulating the structure of the gut microbiome, in the etiology of such diseases such as cancers of the GI tract, and in the prevention of Clostridium difficile infection. We therefore wanted to determine the pangenome of 34 cultured strains of C. scindens and a set of 200 metagenome-assembled genomes (MAGs) to understand the variability among strains. The results indicate that the 34 strains of C. scindens have an open pangenome with 12,720 orthologous gene groups, and a core genome with 1,630 gene families, in addition to 7,051 and 4,039 gene families in the accessory and unique (i.e., strain-exclusive) genomes, respectively. The core genome contains 39% of the proteins with predicted metabolic function, and, in the unique genome, the function of storage and processing of information prevails, with 34% of the proteins being in that category. The pangenome profile including the MAGs also proved to be open. The presence of bile acid inducible (bai) and steroid-17,20-desmolase (des) genes was identified among groups of strains. The analysis reveals that C. scindens strains are distributed into two clades, indicating the possible onset of C. scindens separation into two species, confirmed by gene content, phylogenomic, and average nucleotide identity (ANI) analyses. This study provides insight into the structure and function of the C. scindens pangenome, offering a genetic foundation of significance for many aspects of research on the intestinal microbiota and bile acid metabolism.

RevDate: 2024-09-16

Littlefield C, Lazaro-Guevara JM, Stucki D, et al (2024)

A Draft Pacific Ancestry Pangenome Reference.

bioRxiv : the preprint server for biology pii:2024.08.07.606392.

Individuals of Pacific ancestry suffer some of the highest rates of health disparities yet remain vastly underrepresented in genomic research, including currently available linear and pangenome references. To begin addressing this, we developed the first Pacific ancestry pangenome reference using 23 individuals with diverse Pacific ancestry. We assembled 46 haploid genomes from these 23 individuals, resulting in highly accurate and contiguous genome assemblies with an average quality value of 55.0 and an average N50 of 40.7 Mb, marking the first de novo assembly of highly accurate Pacific ancestry genomes. We combined these assemblies to create a pangenome reference, which added 30.6 Mb of novel sequence missing from the Human Pangenome Reference Consortium (HPRC) reference. Mapping short reads to this pangenome reduced variant call errors and yielded more true-positive variants compared to the HPRC and T2T-CHM13 references. This Pacific ancestry pangenome reference serves as a resource to enhance genetic analyses for this underserved population.

RevDate: 2024-09-16

Feng Y, Weers T, RJ Peters (2024)

Double-barreled defense: dual ent-miltiradiene synthases in most rice cultivars.

aBIOTECH, 5(3):375-380 pii:167.

UNLABELLED: Rice (Oryza sativa) produces numerous diterpenoid phytoalexins that are important in defense against pathogens. Surprisingly, despite extensive previous investigations, a major group of such phytoalexins, the abietoryzins, were only recently reported. These aromatic abietanes are presumably derived from ent-miltiradiene, but such biosynthetic capacity has not yet been reported in O. sativa. While wild rice has been reported to contain such an enzyme, specifically ent-kaurene synthase-like 10 (KSL10), the only characterized ortholog from O. sativa (OsKSL10), specifically from the well-studied cultivar (cv.) Nipponbare, instead has been shown to make ent-sandaracopimaradiene, precursor to the oryzalexins. Notably, in many other cultivars, OsKSL10 is accompanied by a tandem duplicate, termed here OsKSL14. Biochemical characterization of OsKLS14 from cv. Kitaake demonstrates that this produces the expected abietoryzin precursor ent-miltiradiene. Strikingly, phylogenetic analysis of OsKSL10 across the rice pan-genome reveals that from cv. Nipponbare is an outlier, whereas the alleles from most other cultivars group with those from wild rice, suggesting that these also might produce ent-miltiradiene. Indeed, OsKSL10 from cv. Kitaake exhibits such activity as well, consistent with its production of abietoryzins but not oryzalexins. Similarly consistent with these results is the lack of abietoryzin production by cv. Nipponbare. Although their equivalent product outcome might suggest redundancy, OsKSL10 and OsKSL14 were observed to exhibit distinct expression patterns, indicating such differences may underlie retention of these duplicated genes. Regardless, the results reported here clarify abietoryzin biosynthesis and provide insight into the evolution of rice diterpenoid phytoalexins.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42994-024-00167-3.

RevDate: 2024-09-15
CmpDate: 2024-09-15

Hou Y, Gan J, Fan Z, et al (2024)

Haplotype-based pangenomes reveal genetic variations and climate adaptations in moso bamboo populations.

Nature communications, 15(1):8085.

Moso bamboo (Phyllostachys edulis), an ecologically and economically important forest species in East Asia, plays vital roles in carbon sequestration and climate change mitigation. However, intensifying climate change threatens moso bamboo survival. Here we generate high-quality haplotype-based pangenome assemblies for 16 representative moso bamboo accessions and integrated these assemblies with 427 previously resequenced accessions. Characterization of the haplotype-based pangenome reveals extensive genetic variation, predominantly between haplotypes rather than within accessions. Many genes with allele-specific expression patterns are implicated in climate responses. Integrating spatiotemporal climate data reveals more than 1050 variations associated with pivotal climate factors, including temperature and precipitation. Climate-associated variations enable the prediction of increased genetic risk across the northern and western regions of China under future emissions scenarios, underscoring the threats posed by rising temperatures. Our integrated haplotype-based pangenome elucidates moso bamboo's local climate adaptation mechanisms and provides critical genomic resources for addressing intensifying climate pressures on this essential bamboo. More broadly, this study demonstrates the power of long-read sequencing in dissecting adaptive traits in climate-sensitive species, advancing evolutionary knowledge to support conservation.

RevDate: 2024-09-14

Wu Y, Wang F, Lyu K, et al (2024)

Comparative Analysis of Transposable Elements in the Genomes of Citrus and Citrus-Related Genera.

Plants (Basel, Switzerland), 13(17): pii:plants13172462.

Transposable elements (TEs) significantly contribute to the evolution and diversity of plant genomes. In this study, we explored the roles of TEs in the genomes of Citrus and Citrus-related genera by constructing a pan-genome TE library from 20 published genomes of Citrus and Citrus-related accessions. Our results revealed an increase in TE content and the number of TE types compared to the original annotations, as well as a decrease in the content of unclassified TEs. The average length of TEs per assembly was approximately 194.23 Mb, representing 41.76% (Murraya paniculata) to 64.76% (Citrus gilletiana) of the genomes, with a mean value of 56.95%. A significant positive correlation was found between genome size and both the number of TE types and TE content. Consistent with the difference in mean whole-genome size (39.83 Mb) between Citrus and Citrus-related genera, Citrus genomes contained an average of 34.36 Mb more TE sequences than Citrus-related genomes. Analysis of the estimated insertion time and half-life of long terminal repeat retrotransposons (LTR-RTs) suggested that TE removal was not the primary factor contributing to the differences among genomes. These findings collectively indicate that TEs are the primary determinants of genome size and play a major role in shaping genome structures. Principal coordinate analysis (PCoA) of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) identifiers revealed that the fragmented TEs were predominantly derived from ancestral genomes, while intact TEs were crucial in the recent evolutionary diversification of Citrus. Moreover, the presence or absence of intact TEs near the AdhE superfamily was closely associated with the bitterness trait in the Citrus species. Overall, this study enhances TE annotation in Citrus and Citrus-related genomes and provides valuable data for future genetic breeding and agronomic trait research in Citrus.

RevDate: 2024-09-14
CmpDate: 2024-09-14

Song Y, Han S, Wang M, et al (2024)

Pangenome Identification and Analysis of Terpene Synthase Gene Family Members in Gossypium.

International journal of molecular sciences, 25(17): pii:ijms25179677.

Terpene synthases (TPSs), key gatekeepers in the biosynthesis of herbivore-induced terpenes, are pivotal in the diversity of terpene chemotypes across and within plant species. Here, we constructed a gene-based pangenome of the Gossypium genus by integrating the genomes of 17 diploid and 10 tetraploid species. Within this pangenome, 208 TPS syntelog groups (SGs) were identified, comprising 2 core SGs (TPS5 and TPS42) present in all 27 analyzed genomes, 6 softcore SGs (TPS11, TPS12, TPS13, TPS35, TPS37, and TPS47) found in 25 to 26 genomes, 131 dispensable SGs identified in 2 to 24 genomes, and 69 private SGs exclusive to a single genome. The mutational load analysis of these identified TPS genes across 216 cotton accessions revealed a great number of splicing variants and complex splicing patterns. The nonsynonymous/synonymous Ka/Ks value for all 52 analyzed TPS SGs was less than one, indicating that these genes were subject to purifying selection. Of 208 TPS SGs encompassing 1795 genes, 362 genes derived from 102 SGs were identified as atypical and truncated. The structural analysis of TPS genes revealed that gene truncation is a major mechanism contributing to the formation of atypical genes. An integrated analysis of three RNA-seq datasets from cotton plants subjected to herbivore infestation highlighted nine upregulated TPSs, which included six previously characterized TPSs in G. hirsutum (AD1_TPS10, AD1_TPS12, AD1_TPS40, AD1_TPS42, AD1_TPS89, and AD1_TPS104), two private TPSs (AD1_TPS100 and AD2_TPS125), and one atypical TPS (AD2_TPS41). Also, a TPS-associated coexpression module of eight genes involved in the terpenoid biosynthesis pathway was identified in the transcriptomic data of herbivore-infested G. hirsutum. These findings will help us understand the contributions of TPS family members to interspecific terpene chemotypes within Gossypium and offer valuable resources for breeding insect-resistant cotton cultivars.

RevDate: 2024-09-13

Olson MA, Cullimore C, Hutchison WD, et al (2024)

Genes associated with fitness and disease severity in the pan-genome of mastitis-associated Escherichia coli.

Frontiers in microbiology, 15:1452007.

INTRODUCTION: Bovine mastitis caused by Escherichia coli compromises animal health and inflicts substantial product losses in dairy farming. It may manifest as subclinical through severe acute disease and can be transient or persistent in nature. Little is known about bacterial factors that impact clinical outcomes or allow some strains to outcompete others in the mammary gland (MG) environment. Mastitis-associated E. coli (MAEC) may have distinctive characteristics which may contribute to the varied nature of the disease. Given their high levels of intraspecies genetic variability, virulence factors of commonly used MAEC model strains may not be relevant to all members of this group.

METHODS: In this study, we sequenced the genomes of 96 MAEC strains isolated from cattle with clinical mastitis (CM). We utilized clinical severity data to perform genome-wide association studies to identify accessory genes associated with strains isolated from mild or severe CM, or with high or low competitive fitness during in vivo competition assays. Genes associated with mastitis pathogens or commensal strains isolated from bovine sources were also identified.

RESULTS: A type-2 secretion system (T2SS) and a chitinase (ChiA) exported by this system were strongly associated with pathogenic isolates compared with commensal strains. Deletion of chiA from MAEC isolates decreased their adherence to cultured bovine mammary epithelial cells.

DISCUSSION: The increased fitness associated with strains possessing this gene may be due to better attachment in the MG. Overall, these results provide a much richer understanding of MAEC and suggest bacterial processes that may underlie the clinical diversity associated with mastitis and their adaptation to this unique environment.

RevDate: 2024-09-12

Magar S, Kolte V, Sharma G, et al (2024)

Exploring pangenomic diversity and CRISPR-Cas evasion potential in jumbo phages: a comparative genomics study.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Jumbo phages are characterized by their remarkably large-sized genome and unique life cycles. Jumbo phages belonging to Chimalliviridae family protect the replicating phage DNA from host immune systems like CRISPR-Cas and restriction-modification system through a phage nucleus structure. Several recent studies have provided new insights into jumbo phage infection biology, but the pan-genome diversity of jumbo phages and their relationship with CRISPR-Cas targeting beyond Chimalliviridae are not well understood. In this study, we used pan-genome analysis to identify orthologous gene families shared among 331 jumbo phages with complete genomes. We show that jumbo phages lack a universally conserved set of core genes but identified seven "soft-core genes" conserved in over 50% of these phages. These genes primarily govern DNA-related activities, such as replication, repair, or nucleotide synthesis. Jumbo phages exhibit a wide array of accessory and unique genes, underscoring their genetic diversity. Phylogenetic analyses of the soft-core genes revealed frequent horizontal gene transfer events between jumbo phages, non-jumbo phages, and occasionally even giant eukaryotic viruses, indicating a polyphyletic evolutionary nature. We categorized jumbo phages into 11 major viral clusters (VCs) spanning 130 sub-clusters, with the majority being multi-genus jumbo phage clusters. Moreover, through the analysis of hallmark genes related to CRISPR-Cas targeting, we predict that many jumbo phages can evade host immune systems using both known and yet-to-be-identified mechanisms. In summary, our study enhances our understanding of jumbo phages, shedding light on their pan-genome diversity and remarkable genome protection capabilities.

IMPORTANCE: Jumbo phages are large bacterial viruses known for more than 50 years. However, only in recent years, a significant number of complete genome sequences of jumbo phages have become available. In this study, we employed comparative genomic approaches to investigate the genomic diversity and genome protection capabilities of the 331 jumbo phages. Our findings revealed that jumbo phages exhibit high genetic diversity, with only a few genes being relatively conserved across jumbo phages. Interestingly, our data suggest that jumbo phages employ yet-to-be-identified strategies to protect their DNA from the host immune system, such as CRISPR-Cas.

RevDate: 2024-09-11

Sirén J, Eskandar P, Ungaro MT, et al (2024)

Personalized pangenome references.

Nature methods [Epub ahead of print].

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k-mer counts in the reads. We implement the approach in the vg toolkit (https://github.com/vgteam/vg) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods.

RevDate: 2024-09-11

Thorgersen MP, Goff JL, Trotter VV, et al (2024)

Fitness factors impacting survival of a subsurface bacterium in contaminated groundwater.

The ISME journal pii:7755367 [Epub ahead of print].

Many factors contribute to the ability of a microbial species to persist when encountering complexly contaminated environments including time of exposure, the nature and concentration of contaminants, availability of nutritional resources, and possession of a combination of appropriate molecular mechanisms needed for survival. Herein we sought to identify genes that are most important for survival of Gram-negative Enterobacteriaceae in contaminated groundwater environments containing high concentrations of nitrate and metals using the metal-tolerant Oak Ridge Reservation (ORR) isolate, Pantoea sp. MT58 (MT58). Survival fitness experiments in which a randomly barcoded transposon insertion (RB-TnSeq) library of MT58 was exposed directly to contaminated ORR groundwater samples from across a nitrate and mixed metal contamination plume were used to identify genes important for survival with increasing exposure times and concentrations of contaminants, and availability of a carbon source. Genes involved in controlling and using carbon, encoding transcriptional regulators, and related to Gram-negative outer membrane processes were among those found to be important for survival in contaminated ORR groundwater. A comparative genomics analysis of 75 Pantoea genus strains allowed us to further separate the survival determinants into core and non-core genes in the Pantoea pangenome, revealing insights into the survival of subsurface microorganisms during contaminant plume intrusion.

RevDate: 2024-09-11

Liu Z, Yang F, Wan H, et al (2024)

Genome architecture of the allotetraploid wild grass Aegilops ventricosa reveals its evolutionary history and contributions to wheat improvement.

Plant communications pii:S2590-3462(24)00527-3 [Epub ahead of print].

The allotetraploid wild grass Aegilops ventricosa (2n=4X=28, genome D[v]D[v]N[v]N[v]) has been recognized as an important germplasm resource for wheat improvement due to its ability to tolerate biotic stresses. Especially 2N[v]S segment from Aegilops ventricosa, as a stable and effective resistance source, has greatly contributed to wheat improvement. The 2N[v]S/2AS translocation is a prevalent chromosomal translocation between common wheat and wild relatives, ranking just behind the 1B/1R translocation in importance for modern wheat breeding. Here, we assembled a high-quality chromosome-level reference genome of Ae. ventricosa RM271 with a total length of 8.67 Gb. Phylogenomic analyses revealed that the progenitor of the D[v] subgenome of Ae. ventricosa was Ae. tauschii ssp. tauschii (genome DD); in contrast, the progenitor of the D subgenome of bread wheat (Triticum aestivum L.) was Ae. tauschii ssp. strangulata (genome DD). The oldest polyploidization time of Ae. ventricosa occurred ∼0.7 million years ago. The D[v] subgenome of Ae. ventricosa was less conserved than the D subgenome of bread wheat. Construction of a graph-based pangenome of 2AS/6N[v]L (originally known as 2N[v]S) segments from Ae. ventricosa and other genomes in the Triticeae enables us identifying candidate resistance genes sourced from Ae. ventricosa. We identified 12 nonredundant introgressed segments from the D[v] and N[v] subgenomes using a large winter wheat collection representing the full diversity of the wheat European genetic pool, and 29.40% of European wheat varieties inherited at least one of these segments. The high-quality RM271 reference genome will provide a basis for cloning key genes, including the Yr17-Lr37-Sr38-Cre5 resistance gene cluster in Ae. ventricosa, and facilitate the full use of elite wild genetic resources to accelerate wheat improvement.

RevDate: 2024-09-10

Li X, Huo L, Li X, et al (2024)

Genomes of diverse Actinidia species provide insights into cis-regulatory motifs and genes associated with critical traits.

BMC biology, 22(1):200.

BACKGROUND: Kiwifruit, belonging to the genus Actinidia, represents a unique fruit crop characterized by its modern cultivars being genetically diverse and exhibiting remarkable variations in morphological traits and adaptability to harsh environments. However, the genetic mechanisms underlying such morphological diversity remain largely elusive.

RESULTS: We report the high-quality genomes of five Actinidia species, including Actinidia longicarpa, A. macrosperma, A. polygama, A. reticulata, and A. rufa. Through comparative genomics analyses, we identified three whole genome duplication events shared by the Actinidia genus and uncovered rapidly evolving gene families implicated in the development of characteristic kiwifruit traits, including vitamin C (VC) content and fruit hairiness. A range of structural variations were identified, potentially contributing to the phenotypic diversity in kiwifruit. Notably, phylogenomic analyses revealed 76 cis-regulatory elements within the Actinidia genus, predominantly associated with stress responses, metabolic processes, and development. Among these, five motifs did not exhibit similarity to known plant motifs, suggesting the presence of possible novel cis-regulatory elements in kiwifruit. Construction of a pan-genome encompassing the nine Actinidia species facilitated the identification of gene DTZ79_23g14810 specific to species exhibiting extraordinarily high VC content. Expression of DTZ79_23g14810 is significantly correlated with the dynamics of VC concentration, and its overexpression in the transgenic roots of kiwifruit plants resulted in increased VC content.

CONCLUSIONS: Collectively, the genomes and pan-genome of diverse Actinidia species not only enhance our understanding of fruit development but also provide a valuable genomic resource for facilitating the genome-based breeding of kiwifruit.

RevDate: 2024-09-10

Duan S, Yan L, Shen Z, et al (2024)

Genomic analyses of agronomic traits in tea plants and related Camellia species.

Frontiers in plant science, 15:1449006.

The genus Camellia contains three types of domesticates that meet various needs of ancient humans: the ornamental C. japonica, the edible oil-producing C. oleifera, and the beverage-purposed tea plant C. sinensis. The genomic drivers of the functional diversification of Camellia domesticates remain unknown. Here, we present the genomic variations of 625 Camellia accessions based on a new genome assembly of C. sinensis var. assamica ('YK10'), which consists of 15 pseudo-chromosomes with a total length of 3.35 Gb and a contig N50 of 816,948 bp. These accessions were mainly distributed in East Asia, South Asia, Southeast Asia, and Africa. We profiled the population and subpopulation structure in tea tree Camellia to find new evidence for the parallel domestication of C. sinensis var. assamica (CSA) and C. sinensis var. sinensis (CSS). We also identified candidate genes associated with traits differentiating CSA, CSS, oilseed Camellia, and ornamental Camellia cultivars. Our results provide a unique global view of the genetic diversification of Camellia domesticates and provide valuable resources for ongoing functional and molecular breeding research.

RevDate: 2024-09-10

Stanley S, Silva-Costa C, Gomes-Silva J, et al (2024)

CC180 clade dynamics does not universally explain Streptococcus pneumoniae serotype 3 persistence post-vaccine: a global comparative population genomics study.

medRxiv : the preprint server for health sciences pii:2024.08.29.24312665.

BACKGROUND: Clonal complex 180 (CC180) is currently the major clone of serotype 3 Streptococcus pneumoniae (Spn). The 13-valent pneumococcal conjugate vaccine (PCV13) does not have significant efficacy against serotype 3 despite polysaccharide inclusion in the vaccine. It was hypothesized that PCV13 may effectively control Clade I of CC180 but that Clades III and IV are resistant, provoking a population shift that enables serotype 3 persistence. This has been observed in the United States, England, and Wales but not Spain. We tested this hypothesis further utilizing a dataset from Portugal.

METHODS: We whole-genome sequenced (WGS) 501 serotype 3 strains from Portugal isolated from patients with pneumococcal infections between 1999-2020. The draft genomes underwent phylogenetic analyses, pangenome profiling, and a genome-wide association study (GWAS). We also completed antibiotic susceptibility testing and compiled over 2,600 serotype 3 multilocus sequence type 180 (MLST180) WGSs to perform global comparative genomics.

FINDINGS: CC180 Clades I, II, III, IV, and VI distributions were similar when comparing non-invasive pneumonia isolates and invasive disease isolates (Fisher's exact test, P=0.29), and adult and pediatric cases (Fisher's exact test, P=0.074). The serotype 3 CCs shifted post-PCV13 (Fisher's exact test, P<0.0001) and Clade I became dominant. Clade I is largely antibiotic-sensitive and carries the ΦOXC141 prophage but the pangenome is heterogenous. Strains from Portugal and Spain, where Clade I remains dominant post-PCV13, have larger pangenomes and are associated with the presence of two genes encoding hypothetical proteins.

INTERPRETATION: Clade I became dominant in Portugal post-PCV13, despite the burden of the prophage and antibiotic sensitivity. The accessory genome content may mitigate these fitness costs. Regional differences in Clade I prevalence and pangenome heterogeneity suggest that clade dynamics is not a generalizable approach to understanding serotype 3 vaccine escape.

FUNDING: National Institute of Child Health and Human Development, Pfizer, and Merck Sharp & Dohme.

RESEARCH IN CONTEXT: Evidence before this study: We conducted this study because of the mounting interest surrounding the changing prevalence of serotype 3 Streptococcus pneumoniae (Spn) genetic lineages and the potential association with escape from 13-valent pneumococcal conjugate vaccine (PCV13) control. To inform our investigation, we searched the PubMed database using different combinations of the following keywords: "Streptococcus pneumoniae", "serotype 3", "CC180", "PCV13", "Clade Iα", "Clade Iβ", and "Clade II". The search included all English language primary research articles published before July 1 [st] , 2024; this language limitation may bias the results of our assessment. Most ST3 isolates belong to clonal complex 180 (CC180), and one study identified three major lineages within CC180: Clade Iα, Clade Iβ, and Clade II. This study observed a global trend of increasing Clade II prevalence with a concomitant decrease in Clade I prevalence over time, which was associated with the introduction of PCV13 in the United States. A report from England and Wales made a similar observation. It was therefore hypothesized that PCV13 may be effective at controlling Clade Iα and that Clade II is driving vaccine escape. Later work refined the clade classification system as follows: Clade I (Clade Iα), Clades II and VI (Clade Iβ), Clades III and IV (Clade II), and Clade V. Clade I strains are marked by a significantly lower recombination rate partly due to the presence of a lineage-specific prophage interfering with competence development, which is a potential mechanism explaining the possible reduced fitness of Clade I. Clade I is also noted to be mostly antibiotic-susceptible. However, a recent study found that Clade I persists as a dominant serotype 3 lineage in Spain, so the generalizability and implications of clade dynamics remain unclear. Added value of this study: Early work assessing the association between changes in serotype 3 clade prevalence and PCV13 was limited by small sample sizes. In addition, studies investigating differences in clade dynamics did not comprehensively consider patient age or disease manifestations such as non-invasive pneumonia and invasive infections. In this study, we evaluated 501 serotype 3 strains from Portugal to investigate clade dynamics. This must be explored in different geographic contexts for a more robust understanding of changing serotype 3 population genomics. We also sought to define genetic determinants linked to strains from regions in which Clade I remains dominant. This is an important step towards a more mechanistic understanding of the serotype 3 CC180 lineage fitness landscape.Implications of all the available evidence: Unlike other serotypes covered by PCV13, serotype 3 has evaded vaccine control. It has been suggested that Clade I prevalence has decreased due to PCV13, which has created an expanded niche for strains from other clades and ultimately renders PCV13 less effective against serotype 3. This postulation has important implications for the future design of an improved vaccine, so this hypothesis must be thoroughly tested in diverse contexts. We find that Clade I remains the dominant lineage in Portugal even after the introduction of PCV13. We delineate Clade I pangenome heterogeneity and show that strains from Portugal and Spain share similar pangenome features in contrast to Clade I strains from regions where Clade I decreased in prevalence, which should motivate future studies to elucidate more generalizable population genomics trends that may better inform strategies for the design of an improved vaccine.

LOAD NEXT 100 CITATIONS

RJR Experience and Expertise

Researcher

Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.

Administrator

Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.

Technologist

Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.

Facilitator

Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

In the mid-1970s, scientists began using DNA sequences to reexamine the history of all life. Perhaps the most startling discovery to come out of this new field—the study of life’s diversity and relatedness at the molecular level—is horizontal gene transfer (HGT), or the movement of genes across species lines. It turns out that HGT has been widespread and important; we now know that roughly eight percent of the human genome arrived sideways by viral infection—a type of HGT. In The Tangled Tree, “the grandest tale in biology….David Quammen presents the science—and the scientists involved—with patience, candor, and flair” (Nature). We learn about the major players, such as Carl Woese, the most important little-known biologist of the twentieth century; Lynn Margulis, the notorious maverick whose wild ideas about “mosaic” creatures proved to be true; and Tsutomu Wantanabe, who discovered that the scourge of antibiotic-resistant bacteria is a direct result of horizontal gene transfer, bringing the deep study of genome histories to bear on a global crisis in public health.

963 Red Tail Lane
Bellingham, WA 98226

206-300-3443

E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )