About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

20 Jul 2024 at 01:33
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 20 Jul 2024 at 01:33 Created: 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: ( pangenome OR "pan-genome" OR "pan genome" ) NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2024-07-19

Hatmaker EA, Barber AE, Drott MT, et al (2024)

Pathogenicity is associated with population structure in a fungal pathogen of humans.

bioRxiv : the preprint server for biology pii:2024.07.05.602241.

Aspergillus flavus is a clinically and agriculturally important saprotrophic fungus responsible for severe human infections and extensive crop losses. We analyzed genomic data from 250 (95 clinical and 155 environmental) A. flavus isolates from 9 countries, including 70 newly sequenced clinical isolates, to examine population and pan-genome structure and their relationship to pathogenicity. We identified five A. flavus populations, including a new population, D, corresponding to distinct clades in the genome-wide phylogeny. Strikingly, > 75% of clinical isolates were from population D. Accessory genes, including genes within biosynthetic gene clusters, were significantly more common in some populations but rare in others. Population D was enriched for genes associated with zinc ion binding, lipid metabolism, and certain types of hydrolase activity. In contrast to the major human pathogen Aspergillus fumigatus , A. flavus pathogenicity in humans is strongly associated with population structure, making it a great system for investigating how population-specific genes contribute to pathogenicity.

RevDate: 2024-07-17
CmpDate: 2024-07-17

Kusza S, Badaoui B, G Wanjala (2024)

Insights into the genomic homogeneity of Moroccan indigenous sheep breeds though the lens of runs of homozygosity.

Scientific reports, 14(1):16515.

Numerous studies have indicated that Morocco's indigenous sheep breeds are genetically homogenous, posing a risk to their survival in the challenging harsh climate conditions where they predominantly inhabit. To understand the genetic behind genetic homogeneity through the lens of runs of homozygosity (ROH), we analyzed the whole genome sequences of five indigenous sheep breeds (Beni Guil, Ouled Djellal, D'man, Sardi, Timahdite and Admixed).The results from principal component, admixture, Fst, and neighbour joining tree analyses consistently showed a homogenous genetic structure. This structure was characterized by an average length of 1.83 Mb for runs of homozygosity (ROH) segments, with a limited number of long ROH segments (24-48 Mb and > 48 Mb). The most common ROH segments were those ranging from 1-6 Mb. The most significant regions of homozygosity (ROH Islands) were mostly observed in two chromosomes, namely Chr1 and Chr5. Specifically, ROH Islands were exclusively discovered in the Ouled Djellal breed on Chr1, whereas Chr5 exhibited ROH Islands in all breeds. The analysis of ROH Island and iHS technique was employed to detect signatures of selection on Chr1 and Chr5. The results indicate that Chr5 had a high level of homogeneity, with the same genes being discovered across all breeds. In contrast, Chr1 displays some genetic variances between breeds. Genes identified on Chr5 included SLC39A1, IL23A, CAST, IL5, IL13, and IL4 which are responsible for immune response while genes identified on Chr1 include SOD1, SLAMF9, RTP4, CLDN1, and PRKAA2. ROH segment profile and effective population sizes patterns suggests that the genetic uniformity of studied breeds is the outcome of events that transpired between 250 and 300 generations ago. This research not only contributes to the understanding of ROH distribution across breeds but helps design and implement native sheep breeding and conservation strategies in Morocco. Future research, incorporating a broader sample size and utilizing the pangenome for reference, is recommended to further elucidate these breeds' genomic landscapes and adaptive mechanisms.

RevDate: 2024-07-17
CmpDate: 2024-07-17

Machado E, Vasconcellos S, Gomes L, et al (2024)

Phylogenomic and genomic analysis reveals unique and shared genetic signatures of Mycobacterium kansasii complex species.

Microbial genomics, 10(7):.

Species belonging to the Mycobacterium kansasii complex (MKC) are frequently isolated from humans and the environment and can cause serious diseases. The most common MKC infections are caused by the species M. kansasii (sensu stricto), leading to tuberculosis-like disease. However, a broad spectrum of virulence, antimicrobial resistance and pathogenicity of these non-tuberculous mycobacteria (NTM) are observed across the MKC. Many genomic aspects of the MKC that relate to these broad phenotypes are not well elucidated. Here, we performed genomic analyses from a collection of 665 MKC strains, isolated from environmental, animal and human sources. We inferred the MKC pangenome, mobilome, resistome, virulome and defence systems and show that the MKC species harbours unique and shared genomic signatures. High frequency of presence of prophages and different types of defence systems were observed. We found that the M. kansasii species splits into four lineages, of which three are lowly represented and mainly in Brazil, while one lineage is dominant and globally spread. Moreover, we show that four sub-lineages of this most distributed M. kansasii lineage emerged during the twentieth century. Further analysis of the M. kansasii genomes revealed almost 300 regions of difference contributing to genomic diversity, as well as fixed mutations that may explain the M. kansasii's increased virulence and drug resistance.

RevDate: 2024-07-16

Prigozhin DM, Sutherland CA, Rangavajjhala S, et al (2024)

Majority of the highly variable NLRs in maize share genomic location and contain additional target-binding domains.

Molecular plant-microbe interactions : MPMI [Epub ahead of print].

Nucleotide-binding, Leucine Rich Repeat proteins (NLRs) are a major class of immune receptors in plants. NLRs include both conserved and rapidly evolving members, however their evolutionary trajectory in crops remains understudied. Availability of crop pan-genomes enables analysis of the recent events in the evolution of this highly complex gene family within domesticated species. Here, we investigated the NLR complement of 26 nested association mapping (NAM) founder lines of maize. We found that maize has just four main subfamilies containing rapidly evolving highly variable NLR (hvNLR) receptors. Curiously, three of these phylogenetically distinct hvNLR lineages are located in adjacent clusters on chromosome 10. Members of the same hvNLR clade show variable expression and methylation across lines and tissues, consistent with their rapid evolution. By combining sequence diversity analysis and AlphaFold2 computational structure prediction we predicted ligand binding sites in the hvNLRs. We also observed novel insertion domains in the LRR regions of two hvNLR subfamilies that likely contribute to target recogniton. To make this analysis accessible, we created NLRCladeFinder, a Google Colaboratory notebook, that accepts any newly identified NLR sequence, places it in the evolutionary context of the maize pan-NLRome, and provides an updated clade alignment, phylogenetic tree, and sequence diversity information for the gene of interest.

RevDate: 2024-07-16

Chandra G, Gibney D, C Jain (2024)

Haplotype-aware sequence alignment to pangenome graphs.

Genome research pii:gr.279143.124 [Epub ahead of print].

Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for co-linear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in O(|Q||E||H|) time, where Q is the query sequence, E is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than O(|Q||E||H|) is impossible under the Strong Exponential Time Hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in O(|H|N log|H|N) time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than O(|H|N) is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate that our algorithm achieves better consistency with ground-truth recombinations when compared to a haplotype-agnostic algorithm.

RevDate: 2024-07-16

Uzuner H, Paschen A, Schadendorf D, et al (2024)

Orthanq: transparent and uncertainty-aware haplotype quantification with application in HLA-typing.

BMC bioinformatics, 25(1):240.

BACKGROUND: Identification of human leukocyte antigen (HLA) types from DNA-sequenced human samples is important in organ transplantation and cancer immunotherapy and remains a challenging task considering sequence homology and extreme polymorphism of HLA genes.

RESULTS: We present Orthanq, a novel statistical model and corresponding application for transparent and uncertainty-aware quantification of haplotypes. We utilize our approach to perform HLA typing while, for the first time, reporting uncertainty of predictions and transparently observing mutations beyond reported HLA types. Using 99 gold standard samples from 1000 Genomes, Illumina Platinum Genomes and Genome In a Bottle projects, we show that Orthanq can provide overall superior accuracy and shorter runtimes than state-of-the-art HLA typers.

CONCLUSIONS: Orthanq is the first approach that allows to directly utilize existing pangenome alignments and type all HLA loci. Moreover, it can be generalized for usages beyond HLA typing, e.g. for virus lineage quantification. Orthanq is available under https://orthanq.github.io .

RevDate: 2024-07-16

Ceres K, Zehr JD, Murrell C, et al (2024)

Evolutionary genomic analyses of canine E. coli infections identify a relic capsular locus associated with resistance to multiple classes of antimicrobials.

Applied and environmental microbiology [Epub ahead of print].

UNLABELLED: Infections caused by antimicrobial-resistant Escherichia coli are the leading cause of death attributed to antimicrobial resistance (AMR) worldwide, and the known AMR mechanisms involve a range of functional proteins. Here, we employed a pan-genome wide association study (GWAS) approach on over 1,000 E. coli isolates from sick dogs collected across the US and Canada and identified a strong statistical association (empirical P < 0.01) of AMR, involving a range of antibiotics to a group 1 capsular (CPS) gene cluster. This cluster included genes under relaxed selection pressure, had several loci missing, and had pseudogenes for other key loci. Furthermore, this cluster is widespread in E. coli and Klebsiella clinical isolates across multiple host species. Earlier studies demonstrated that the octameric CPS polysaccharide export protein Wza can transmit macrolide antibiotics into the E. coli periplasm. We suggest that the CPS in question, and its highly divergent Wza, functions as an antibiotic trap, preventing antimicrobial penetration. We also highlight the high diversity of lineages circulating in dogs across all regions studied, the overlap with human lineages, and regional prevalence of resistance to multiple antimicrobial classes.

IMPORTANCE: Much of the human genomic epidemiology data available for E. coli mechanism discovery studies has been heavily biased toward shiga-toxin producing strains from humans and livestock. E. coli occupies many niches and produces a wide variety of other significant pathotypes, including some implicated in chronic disease. We hypothesized that since dogs tend to share similar strains with their owners and are treated with similar antibiotics, their pathogenic isolates will harbor unexplored AMR mechanisms of importance to humans as well as animals. By comparing over 1,000 genomes with in vitro antimicrobial susceptibility data from sick dogs across the US and Canada, we identified a strong multidrug resistance association with an operon that appears to have once conferred a type 1 capsule production system.

RevDate: 2024-07-16

Jespersen MG, Hayes AJ, Tong SYC, et al (2024)

Pangenome evaluation of gene essentiality in Streptococcus pyogenes.

Microbiology spectrum [Epub ahead of print].

Bacterial species often consist of strains with variable gene content, collectively referred to as the pangenome. Variations in the genetic makeup of strains can alter bacterial physiology and fitness. To define biologically relevant genes of a genome, genome-wide transposon mutant libraries have been used to identify genes essential for survival or virulence in a given strain. Such phenotypic studies have been conducted in four different genotypes of the human pathogen Streptococcus pyogenes, yet challenges exist in comparing results across studies conducted in different genetic backgrounds and conditions. To advance genotype to phenotype inferences across different S. pyogenes strains, we built a pangenome database of 249 S. pyogenes reference genomes. We systematically re-analyzed publicly available transposon sequencing datasets from S. pyogenes using a transposon sequencing-specific analysis pipeline, Transit. Across four genetic backgrounds and nine phenotypic conditions, 355 genes were essential for survival, corresponding to ~24% of the core genome. Clusters of Orthologous Genes (COG) categories related to coenzyme and lipid transport and growth functions were overrepresented as essential. Finally, essential operons across S. pyogenes genotypes were defined, with an increased number of essential operons detected under in vivo conditions. This study provides an extendible database to which new studies can be added, and a searchable html-based resource to direct future investigations into S. pyogenes biology.IMPORTANCEStreptococcus pyogenes is a human-adapted pathogen occupying restricted ecological niches. Understanding the essentiality of genes across different strains and experimental conditions is important to direct research questions and efforts to prevent the large burden of disease caused by S. pyogenes. To this end we systematically reanalyzed transposon sequencing studies in S. pyogenes using transposon sequencing-specific methods, integrating them into an extendible meta-analysis framework. This provides a repository of gene essentiality in S. pyogenes which was used to highlight specific genes of interest and for the community to guide future phenotypic studies.

RevDate: 2024-07-16

Brejová B, Gagie T, Herencsárová E, et al (2024)

Maximum-scoring path sets on pangenome graphs of constant treewidth.

Frontiers in bioinformatics, 4:1391086.

We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph G and a non-negative startup penalty c, we can find a set of vertex-disjoint paths in G with maximum total score when each path's score is its vertices' total weight minus c. We call this new problem maximum-scoring path sets (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.

RevDate: 2024-07-16

Montecillo JAV (2024)

Genomics of the Thermophilic Bacterium Thermosulfidibacter takaii Reveals Novel Lineage of Deep-Branching Bacterial Phylum.

Indian journal of microbiology, 64(2):762-772.

UNLABELLED: The thermophilic bacterium Thermosulfidibacter takaii is affiliated to the deep-branching bacterial lineage in the phylum Aquificota. However, the recent taxonomic study of the phylum Aquificota revealed that T. takaii has no specific association with the phylum. The fact that T. takaii is considered an important model organism for studying the evolution and kinetics of ancestral carbon metabolism pathways, its proper classification is therefore of significant interest. In this work, phylogenomics and comparative genomic analyses were employed to ascertain the taxonomic placement of T. takaii. Results from the phylogenetic analyses based on 16S rRNA gene and core genome sequences confirmed the exclusion of T. takaii from the phylum Aquificota and further revealed a phylum-level lineage for T. takaii. The analysis of conserved signature indels (CSIs) specific for the phylum Aquificota also supported the exclusion of T. takaii from the phylum. Pan-genome analysis of T. takaii along with the members of the closely related clade from the phylum Thermodesulfobacteriota revealed that T. takaii was indeed distinct, supporting its phylum-level placement. Furthermore, the presence of CSIs specific to T. takaii, and the results from the average nucleotide identity and average amino acid identity analyses, together with the unique characteristic of T. takaii also provided evidence supporting its assignment to a novel phylum. Based on these results, T. takaii is proposed to be transferred to a novel family, Thermosulfidibacteraceae fam. nov., of a novel order, Thermosulfidibacterales ord. nov., and a novel class, Thermosulfidibacteria classis nov., within a novel phylum Thermosulfidibacterota phyl. nov.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12088-024-01214-9.

RevDate: 2024-07-16

Rodenburg SYA, de Ridder D, Govers F, et al (2024)

Oomycete Metabolism Is Highly Dynamic and Reflects Lifestyle Adaptations.

Molecular plant-microbe interactions : MPMI [Epub ahead of print].

The selective pressure of pathogen-host symbiosis drives adaptations. How these interactions shape the metabolism of pathogens is largely unknown. Here, we use comparative genomics to systematically analyze the metabolic networks of oomycetes, a diverse group of eukaryotes that includes saprotrophs as well as animal and plant pathogens, with the latter causing devastating diseases with significant economic and/or ecological impacts. In our analyses of 44 oomycete species, we uncover considerable variation in metabolism that can be linked to lifestyle differences. Comparisons of metabolic gene content reveal that plant pathogenic oomycetes have a bipartite metabolism consisting of a conserved core and an accessory set. The accessory set can be associated with the degradation of defense compounds produced by plants when challenged by pathogens. Obligate biotrophic oomycetes have smaller metabolic networks, and taxonomically distantly related biotrophic lineages display convergent evolution by repeated gene losses in both the conserved as well as the accessory set of metabolisms. When investigating to what extent the metabolic networks in obligate biotrophs differ from those in hemibiotrophic plant pathogens, we observe that the losses of metabolic enzymes in obligate biotrophs are not random and that gene losses predominantly influence the terminal branches of the metabolic networks. Our analyses represent the first metabolism-focused comparison of oomycetes at this scale and will contribute to a better understanding of the evolution of oomycete metabolism in relation to lifestyle adaptation. Numerous oomycete species are devastating plant pathogens that cause major damage in crops and natural ecosystems. Their interactions with hosts are shaped by strong selection, but how selection affects adaptation of the primary metabolism to a pathogenic lifestyle is not yet well established. By pan-genome and metabolic network analyses of distantly related oomycete pathogens and their nonpathogenic relatives, we reveal considerable lifestyle- and lineage-specific adaptations. This study contributes to a better understanding of metabolic adaptations in pathogenic oomycetes in relation to lifestyle, host, and environment, and the findings will help in pinpointing potential targets for disease control. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.

RevDate: 2024-07-15

Chang T, Gavelis GS, Brown JM, et al (2024)

Genomic representativeness and chimerism in large collections of SAGs and MAGs of marine prokaryoplankton.

Microbiome, 12(1):126.

BACKGROUND: Single amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) are the predominant sources of information about the coding potential of uncultured microbial lineages, but their strengths and limitations remain poorly understood. Here, we performed a direct comparison of two previously published collections of thousands of SAGs and MAGs obtained from the same, global environment.

RESULTS: We found that SAGs were less prone to chimerism and more accurately reflected the relative abundance and the pangenome content of microbial lineages inhabiting the epipelagic of the tropical and subtropical ocean, as compared to MAGs. SAGs were also better suited to link genome information with taxa discovered through 16S rRNA amplicon analyses. Meanwhile, MAGs had the advantage of more readily recovering genomes of rare lineages.

CONCLUSIONS: Our analyses revealed the relative strengths and weaknesses of the two most commonly used genome recovery approaches in environmental microbiology. These considerations, as well as the need for better tools for genome quality assessment, should be taken into account when designing studies and interpreting data that involve SAGs or MAGs. Video Abstract.

RevDate: 2024-07-15
CmpDate: 2024-07-15

Bosi E, Taviani E, Avesani A, et al (2024)

Pan-Genome Provides Insights into Vibrio Evolution and Adaptation to Deep-Sea Hydrothermal Vents.

Genome biology and evolution, 16(7):.

This study delves into the genomic features of 10 Vibrio strains collected from deep-sea hydrothermal vents in the Pacific Ocean, providing insights into their evolutionary history and ecological adaptations. Through sequencing and pan-genome analysis involving 141 Vibrio species, we found that deep-sea strains exhibit larger genomes with unique gene distributions, suggesting adaptation to the vent environment. The phylogenomic reconstruction of the investigated isolates revealed the presence of 2 main clades: The first is monophyletic, consisting exclusively of Vibrio alginolyticus, while the second forms a monophyletic clade comprising both Vibrio antiquarius and Vibrio diabolicus species, which were previously isolated from deep-sea vents. All strains carry virulence and antibiotic resistance genes related to those found in human pathogenic Vibrio species which may play a wider ecological role other than host infection in these environments. In addition, functional genomic analysis identified genes potentially related to deep-sea survival and stress response, alongside candidate genes encoding for novel antimicrobial agents. Ultimately, the pan-genome we generated represents a valuable resource for future studies investigating the taxonomy, evolution, and ecology of Vibrio species.

RevDate: 2024-07-14
CmpDate: 2024-07-14

Seru LV, Forde TL, Roberto-Charron A, et al (2024)

Genomic characterization and virulence gene profiling of Erysipelothrix rhusiopathiae isolated from widespread muskox mortalities in the Canadian Arctic Archipelago.

BMC genomics, 25(1):691.

BACKGROUND: Muskoxen are important ecosystem components and provide food, economic opportunities, and cultural well-being for Indigenous communities in the Canadian Arctic. Between 2010 and 2021, Erysipelothrix rhusiopathiae was isolated from carcasses of muskoxen, caribou, a seal, and an Arctic fox during multiple large scale mortality events in the Canadian Arctic Archipelago. A single strain ('Arctic clone') of E. rhusiopathiae was associated with the mortalities on Banks, Victoria and Prince Patrick Islands, Northwest Territories and Nunavut, Canada (2010-2017). The objectives of this study were to (i) characterize the genomes of E. rhusiopathiae isolates obtained from more recent muskox mortalities in the Canadian Arctic in 2019 and 2021; (ii) identify and compare common virulence traits associated with the core genome and mobile genetic elements (i.e. pathogenicity islands and prophages) among Arctic clone versus other E. rhusiopathiae genomes; and iii) use pan-genome wide association studies (GWAS) to determine unique genetic contents of the Arctic clone that may encode virulence traits and that could be used for diagnostic purposes.

RESULTS: Phylogenetic analyses revealed that the newly sequenced E. rhusiopathiae isolates from Ellesmere Island, Nunavut (2021) also belong to the Arctic clone. Of 17 virulence genes analysed among 28 Arctic clone isolates, four genes - adhesin, rhusiopathiae surface protein-A (rspA), choline binding protein-B (cbpB) and CDP-glycerol glycerophosphotransferase (tagF) - had amino acid sequence variants unique to this clone when compared to 31 other E. rhusiopathiae genomes. These genes encode proteins that facilitate E. rhusiopathiae to attach to the host endothelial cells and form biofilms. GWAS analyses using Scoary found several unique genes to be overrepresented in the Arctic clone.

CONCLUSIONS: The Arctic clone of E. rhusiopathiae was associated with multiple muskox mortalities spanning over a decade and multiple Arctic islands with distances over 1000 km, highlighting the extent of its spatiotemporal spread. This clone possesses unique gene content, as well as amino acid variants in multiple virulence genes that are distinct from the other closely related E. rhusiopathiae isolates. This study establishes an essential foundation on which to investigate whether these differences are correlated with the apparent virulence of this specific clone through in vitro and in vivo studies.

RevDate: 2024-07-14

Biswas R, Swetha RG, Basu S, et al (2024)

Designing multi-epitope vaccine against human cytomegalovirus integrating pan-genome and reverse vaccinology pipelines.

Biologicals : journal of the International Association of Biological Standardization, 87:101782 pii:S1045-1056(24)00039-3 [Epub ahead of print].

Human cytomegalovirus (HCMV) is accountable for high morbidity in neonates and immunosuppressed individuals. Due to the high genetic variability of HCMV, current prophylactic measures are insufficient. In this study, we employed a pan-genome and reverse vaccinology approach to screen the target for efficient vaccine candidates. Four proteins, envelope glycoprotein M, UL41A, US23, and US28, were shortlisted based on cellular localization, high solubility, antigenicity, and immunogenicity. A total of 29 B-cell and 44 T-cell highly immunogenic and antigenic epitopes with high global population coverage were finalized using immunoinformatics tools and algorithms. Further, the epitopes that were overlapping among the finalized B-cell and T-cell epitopes were linked with suitable linkers to form various combinations of multi-epitopic vaccine constructs. Among 16 vaccine constructs, Vc12 was selected based on physicochemical and structural properties. The docking and molecular simulations of VC12 were performed, which showed its high binding affinity (-23.35 kcal/mol) towards TLR4 due to intermolecular hydrogen bonds, salt bridges, and hydrophobic interactions, and there were only minimal fluctuations. Furthermore, Vc12 eliciting a good response was checked for its expression in Escherichia coli through in silico cloning and codon optimization, suggesting it to be a potent vaccine candidate.

RevDate: 2024-07-13
CmpDate: 2024-07-13

Egor G, Artem K, Maksim B, et al (2024)

Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.

BMC bioinformatics, 25(1):238.

MOTIVATION: Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods.

RESULTS: In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.

RevDate: 2024-07-13
CmpDate: 2024-07-13

Yang X, Luo S, Yang S, et al (2024)

Chromosome-level genome assembly of Hippophae rhamnoides variety.

Scientific data, 11(1):776.

Fructus hippophae (Hippophae rhamnoides spp. mongolica×Hippophae rhamnoides sinensis), a hybrid variety of sea buckthorn that Hippophae rhamnoides spp. mongolica serves as the female parent and Hippophae rhamnoides sinensis serves as the male parent, is a traditional plant with great potentials of economic and medical values. Herein, we gained a chromosome-level genome of Fructus hippophae about 918.59 Mb, with the scaffolds N50 reaching 83.65 Mb. Then, we anchored 440 contigs with 97.17% of the total genome sequences onto 12 pseudochromosomes. Next, de-novo, homology and transcriptome assembly strategies were adopted for gene structure prediction. This predicted 36475 protein-coding genes, of which 36226 genes could be functionally annotated. Simultaneously, various strategies were used for quality assessment, both the complete BUSCO value (98.80%) and the mapping rate indicated the high assembly quality. Repetitive elements, which occupied 63.68% of the genome, and 1483600 bp of non-coding RNA were annotated. Here, we provide genomic information on female plants of a popular variety, which can provide data for pan-genomic construction of sea buckthorn and for the resolution of the mechanism of sex differentiation.

RevDate: 2024-07-13
CmpDate: 2024-07-13

Gao Z, Lu Y, Chong Y, et al (2024)

Beef Cattle Genome Project: Advances in Genome Sequencing, Assembly, and Functional Genes Discovery.

International journal of molecular sciences, 25(13): pii:ijms25137147.

Beef is a major global source of protein, playing an essential role in the human diet. The worldwide production and consumption of beef continue to rise, reflecting a significant trend. However, despite the critical importance of beef cattle resources in agriculture, the diversity of cattle breeds faces severe challenges, with many breeds at risk of extinction. The initiation of the Beef Cattle Genome Project is crucial. By constructing a high-precision functional annotation map of their genome, it becomes possible to analyze the genetic mechanisms underlying important traits in beef cattle, laying a solid foundation for breeding more efficient and productive cattle breeds. This review details advances in genome sequencing and assembly technologies, iterative upgrades of the beef cattle reference genome, and its application in pan-genome research. Additionally, it summarizes relevant studies on the discovery of functional genes associated with key traits in beef cattle, such as growth, meat quality, reproduction, polled traits, disease resistance, and environmental adaptability. Finally, the review explores the potential of telomere-to-telomere (T2T) genome assembly, structural variations (SVs), and multi-omics techniques in future beef cattle genetic breeding. These advancements collectively offer promising avenues for enhancing beef cattle breeding and improving genetic traits.

RevDate: 2024-07-13

Huang A, Feng S, Ye Z, et al (2024)

Genome Assembly and Structural Variation Analysis of Luffa acutangula Provide Insights on Flowering Time and Ridge Development.

Plants (Basel, Switzerland), 13(13): pii:plants13131828.

Luffa spp. is an important worldwide cultivated vegetable and medicinal plant from the Cucurbitaceae family. In this study, we report a high-quality chromosome-level genome of the high-generation inbred line SG261 of Luffa acutangula. The genomic sequence was determined by PacBio long reads, Hi-C sequencing reads, and 10× Genomics sequencing, with an assembly size of 739.82 Mb, contig N50 of 18.38 Mb, and scaffold N50 of 56.08 Mb. The genome of L. acutangula SG261 was predicted to contain 27,312 protein-coding genes and 72.56% repetitive sequences, of which long terminal repeats (LTRs) were an important form of repetitive sequences, accounting for 67.84% of the genome. Phylogenetic analysis reveals that L. acutangula evolved later than Luffa cylindrica, and Luffa is closely related to Momodica charantia. Comparing the genome of L. acutangula SG261 and L. cylindrica with PacBio data, 67,128 high-quality structural variations (SVs) and 55,978 presence-absence variations (PAVs) were identified in SG261, resulting in 2424 and 1094 genes with variation in the CDS region, respectively, and there are 287 identical genes affected by two different structural variation analyses. In addition, we found that the transcription factor FY (FLOWERING LOCUS Y) families had a large expansion in L. acutangula SG261 (flowering in the morning) compared to L. cylindrica (flowering in the afternoon), which may result in the early flowering time in L. acutangula SG261. This study provides valuable reference for the breeding of and pan-genome research into Luffa species.

RevDate: 2024-07-12
CmpDate: 2024-07-12

Miga KH (2024)

From complete genomes to pangenomes.

American journal of human genetics, 111(7):1265-1268.

Highlighting the Distinguished Speakers Symposium on "The Future of Human Genetics and Genomics," this collection of articles is based on presentations at the ASHG 2023 Annual Meeting in Washington, DC, in celebration of all our field has accomplished in the past 75 years, since the founding of ASHG in 1948.

RevDate: 2024-07-12

Barcia-Cruz R, Balboa S, Lema A, et al (2024)

Comparative genomics of Vibrio toranzoniae strains.

International microbiology : the official journal of the Spanish Society for Microbiology [Epub ahead of print].

Vibrio toranzoniae is a marine bacterium belonging to the Splendidus clade that was originally isolated from healthy clams in Galicia (NW Spain). Its isolation from different hosts and seawater indicated two lifestyles and wide geographical distribution. The aim of the present study was to determine the differences at the genomic level among six strains (4 isolated from clam and 2 from seawater) and to determine their phylogeny. For this purpose, whole genomes of the six strains were sequenced by different technologies including Illumina and PacBio, and the resulting sequences were corrected. Genomes were annotated and compared using different online tools. Furthermore, the study of core- and pan-genomes were examined, and the phylogeny was inferred. The content of the core genome ranged from 2953 to 2766 genes and that of the pangenome ranged from 6278 to 6132, depending on the tool used. Although the strains shared certain homology, with DDH values ranging from 77.10 to 82.30 and values of OrthoANI values higher than 97%, some differences were found related to motility, capsule synthesis, iron acquisition systems or mobile genetic elements. Phylogenetic analysis of the core genome did not reveal a differentiation of the strains according to their lifestyle (commensal or free-living), but that of the pangenome indicated certain geographical isolation in the same growing area. This study led to the reclassification of some isolates formerly described as V. toranzoniae and demonstrated the importance of cured deposited sequences to proper phylogenetic assignment.

RevDate: 2024-07-12
CmpDate: 2024-07-12

Li XY, Fang XM, Jia HT, et al (2024)

Noviherbaspirillum album sp. nov., an airborne bacteria isolated from an urban area of Beijing, China.

International journal of systematic and evolutionary microbiology, 74(7):.

A Gram-negative, ellipsoidal to short-rod-shaped, motile bacterium was isolated from Beijing's urban air. The isolate exhibited the closest kinship with Noviherbaspirillum aerium 122213-3[T], exhibiting 98.4 % 16S rRNA gene sequence similarity. Phylogenetic analyses based on 16S rRNA gene sequences and genomes showed that it clustered closely with N. aerium 122213-3[T], thus forming a distinct phylogenetic lineage within the genus Noviherbaspirillum. The average nucleotide identity and digital DNA-DNA hybridization values between strain I16B-00201[T] and N. aerium 122213-3[T] were 84.6 and 29.4 %, respectively. The respiratory ubiquinone was ubiquinone 8. The major fatty acids (>10 %) were summed feature 3 (C16:1ω6c/C16:1ω7c, 43.3 %), summed feature 8 (C18:1ω7c/C18:1ω6c, 15.9 %) and C12:0 (11.0 %). The polyamine profile showed putrescine as the predominant compound. The polar lipid profile consisted of diphosphatidylglycerol, phosphatidylglycerol, phosphatidylethanolamine, phosphatidylcholine, unknown lipids and unknown phosphatidylaminolipids. The phenotypic, phylogenetic and chemotaxonomic results consistently supported that strain I16B-00201[T] represented a novel species of the genus Noviherbaspirillum, for which the name Noviherbaspirillum album sp. nov. is proposed, with I16B-00201[T] (=CPCC 100848[T]=KCTC 52095[T]) designated as the type strain. Its DNA G+C content is 59.4 mol%. Pan-genome analysis indicated that some Noviherbaspirillum species possess diverse nitrogen and aromatic compound metabolism pathways, suggesting their potential value in pollutant treatment.

RevDate: 2024-07-12

Schüler MA, Riedel T, Overmann J, et al (2024)

Comparative genome analyses of clinical and non-clinical Clostridioides difficile strains.

Frontiers in microbiology, 15:1404491.

The pathogenic bacterium Clostridioides difficile is a worldwide health burden with increasing morbidity, mortality and antibiotic resistances. Therefore, extensive research efforts are made to unravel its virulence and dissemination. One crucial aspect for C. difficile is its mobilome, which for instance allows the spread of antibiotic resistance genes (ARG) or influence strain virulence. As a nosocomial pathogen, the majority of strains analyzed originated from clinical environments and infected individuals. Nevertheless, C. difficile can also be present in human intestines without disease development or occur in diverse environmental habitats such as puddle water and soil, from which several strains could already be isolated. We therefore performed comprehensive genome comparisons of closely related clinical and non-clinical strains to identify the effects of the clinical background. Analyses included the prediction of virulence factors, ARGs, mobile genetic elements (MGEs), and detailed examinations of the pan genome. Clinical-related trends were thereby observed. While no significant differences were identified in fundamental C. difficile virulence factors, the clinical strains carried more ARGs and MGEs, and possessed a larger accessory genome. Detailed inspection of accessory genes revealed higher abundance of genes with unknown function, transcription-associated, or recombination-related activity. Accessory genes of these functions were already highlighted in other studies in association with higher strain virulence. This specific trend might allow the strains to react more efficiently on changing environmental conditions in the human host such as emerging stress factors, and potentially increase strain survival, colonization, and strain virulence. These findings indicated an adaptation of the strains to the clinical environment. Further, implementation of the analysis results in pairwise genome comparisons revealed that the majority of these accessory genes were encoded on predicted MGEs, shedding further light on the mobile genome of C. difficile. We therefore encourage the inclusion of non-clinical strains in comparative analyses.

RevDate: 2024-07-11
CmpDate: 2024-07-11

Zomer A, Ingham CJ, von Meijenfeldt FAB, et al (2024)

Structural color in the bacterial domain: The ecogenomics of a 2-dimensional optical phenotype.

Proceedings of the National Academy of Sciences of the United States of America, 121(29):e2309757121.

Structural color is an optical phenomenon resulting from light interacting with nanostructured materials. Although structural color (SC) is widespread in the tree of life, the underlying genetics and genomics are not well understood. Here, we collected and sequenced a set of 87 structurally colored bacterial isolates and 30 related strains lacking SC. Optical analysis of colonies indicated that diverse bacteria from at least two different phyla (Bacteroidetes and Proteobacteria) can create two-dimensional packing of cells capable of producing SC. A pan-genome-wide association approach was used to identify genes associated with SC. The biosynthesis of uroporphyrin and pterins, as well as carbohydrate utilization and metabolism, was found to be involved. Using this information, we constructed a classifier to predict SC directly from bacterial genome sequences and validated it by cultivating and scoring 100 strains that were not part of the training set. We predicted that SCr is widely distributed within gram-negative bacteria. Analysis of over 13,000 assembled metagenomes suggested that SC is nearly absent from most habitats associated with multicellular organisms except macroalgae and is abundant in marine waters and surface/air interfaces. This work provides a large-scale ecogenomics view of SC in bacteria and identifies microbial pathways and evolutionary relationships that underlie this optical phenomenon.

RevDate: 2024-07-11
CmpDate: 2024-07-11

Perrier M, AE Barber (2024)

Unraveling the genomic diversity and virulence of human fungal pathogens through pangenomics.

PLoS pathogens, 20(7):e1012313 pii:PPATHOGENS-D-24-00568.

RevDate: 2024-07-10

Seersholm FV, Sjögren KG, Koelman J, et al (2024)

Repeated plague infections across six generations of Neolithic Farmers.

Nature [Epub ahead of print].

In the period between 5,300 and 4,900 calibrated years before present (cal. BP), populations across large parts of Europe underwent a period of demographic decline[1,2]. However, the cause of this so-called Neolithic decline is still debated. Some argue for an agricultural crisis resulting in the decline[3], others for the spread of an early form of plague[4]. Here we use population-scale ancient genomics to infer ancestry, social structure and pathogen infection in 108 Scandinavian Neolithic individuals from eight megalithic graves and a stone cist. We find that the Neolithic plague was widespread, detected in at least 17% of the sampled population and across large geographical distances. We demonstrate that the disease spread within the Neolithic community in three distinct infection events within a period of around 120 years. Variant graph-based pan-genomics shows that the Neolithic plague genomes retained ancestral genomic variation present in Yersinia pseudotuberculosis, including virulence factors associated with disease outcomes. In addition, we reconstruct four multigeneration pedigrees, the largest of which consists of 38 individuals spanning six generations, showing a patrilineal social organization. Lastly, we document direct genomic evidence for Neolithic female exogamy in a woman buried in a different megalithic tomb than her brothers. Taken together, our findings provide a detailed reconstruction of plague spread within a large patrilineal kinship group and identify multiple plague infections in a population dated to the beginning of the Neolithic decline.

RevDate: 2024-07-10
CmpDate: 2024-07-10

Khan A, Tian R, Bean SR, et al (2024)

Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds.

Communications biology, 7(1):841.

Cereal seeds are vital for food, feed, and agricultural sustainability because they store and provide essential nutrients to human and animal food and feed systems. Unraveling molecular processes in seed development is crucial for enhancing cereal grain yield and quality. We analyze spatiotemporal transcriptome and metabolome profiles during sorghum seed development in the inbred line 'BTx623'. Morphological and molecular analyses identify the key stages of seed maturation, specifying starch biosynthesis onset at 5 days post-anthesis (dpa) and protein at 10 dpa. Transcriptome profiling from 1 to 25 dpa reveal dynamic gene expression pathways, shifting from cellular growth and embryo development (1-5 dpa) to cell division, fatty acid biosynthesis (5-25 dpa), and seed storage compounds synthesis in the endosperm (5-25 dpa). Network analysis identifies 361 and 207 hub genes linked to starch and protein synthesis in the endosperm, respectively, which will help breeders enhance sorghum grain quality. The availability of this data in the sorghum reference genome line establishes a baseline for future studies as new pangenomes emerge, which will consider copy number and presence-absence variation in functional food traits.

RevDate: 2024-07-08

Zhang Y, Zhao M, Tan J, et al (2024)

Telomere-to-telomere Citrullus super-pangenome provides direction for watermelon breeding.

Nature genetics [Epub ahead of print].

To decipher the genetic diversity within the cucurbit genus Citrullus, we generated telomere-to-telomere (T2T) assemblies of 27 distinct genotypes, encompassing all seven Citrullus species. This T2T super-pangenome has expanded the previously published reference genome, T2T-G42, by adding 399.2 Mb and 11,225 genes. Comparative analysis has unveiled gene variants and structural variations (SVs), shedding light on watermelon evolution and domestication processes that enhanced attributes such as bitterness and sugar content while compromising disease resistance. Multidisease-resistant loci from Citrullus amarus and Citrullus mucosospermus were successfully introduced into cultivated Citrullus lanatus. The SVs identified in C. lanatus have not only been inherited from cordophanus but also from C. mucosospermus, suggesting additional ancestors beyond cordophanus in the lineage of cultivated watermelon. Our investigation substantially improves the comprehension of watermelon genome diversity, furnishing comprehensive reference genomes for all Citrullus species. This advancement aids in the exploration and genetic enhancement of watermelon using its wild relatives.

RevDate: 2024-07-08

Vakirlis N, A Kupczok (2024)

Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins.

Genome research pii:gr.278977.124 [Epub ahead of print].

Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.

RevDate: 2024-07-08

Lee J, Hunter B, H Shim (2024)

A pangenome analysis of ESKAPE bacteriophages: the underrepresentation may impact machine learning models.

Frontiers in molecular biosciences, 11:1395450.

Bacteriophages are the most prevalent biological entities in the biosphere. However, limitations in both medical relevance and sequencing technologies have led to a systematic underestimation of the genetic diversity within phages. This underrepresentation not only creates a significant gap in our understanding of phage roles across diverse biosystems but also introduces biases in computational models reliant on these data for training and testing. In this study, we focused on publicly available genomes of bacteriophages infecting high-priority ESKAPE pathogens to show the extent and impact of this underrepresentation. First, we demonstrate a stark underrepresentation of ESKAPE phage genomes within the public genome and protein databases. Next, a pangenome analysis of these ESKAPE phages reveals extensive sharing of core genes among phages infecting the same host. Furthermore, genome analyses and clustering highlight close nucleotide-level relationships among the ESKAPE phages, raising concerns about the limited diversity within current public databases. Lastly, we uncover a scarcity of unique lytic phages and phage proteins with antimicrobial activities against ESKAPE pathogens. This comprehensive analysis of the ESKAPE phages underscores the severity of underrepresentation and its potential implications. This lack of diversity in phage genomes may restrict the resurgence of phage therapy and cause biased outcomes in data-driven computational models due to incomplete and unbalanced biological datasets.

RevDate: 2024-07-05
CmpDate: 2024-07-05

Lu YT, Wu YY, Li YN, et al (2024)

Saccharopolyspora mangrovi sp. nov., a novel mangrove soil actinobacterium with distinct metabolic potential revealed by comparative genomic analysis.

Archives of microbiology, 206(8):342.

A novel mangrove soil-derived actinomycete, strain S2-29[T], was found to be most closely related to Saccharopolyspora karakumensis 5K548[T] based on 16 S rRNA sequence (99.24% similarity) and genomic phylogenetic analyses. However, significant divergence in digital DNA-DNA hybridization, average nucleotide identity, and unique biosynthetic gene cluster possession distinguished S2-29[T] as a distinct Saccharopolyspora species. Pan genome evaluation revealed exceptional genomic flexibility in genus Saccharopolyspora, with > 95% accessory genome content. Strain S2-29[T] harbored 718 unique genes, largely implicated in energetic metabolisms, indicating different metabolic capacities from its close relatives. Several uncharacterized biosynthetic gene clusters in strain S2-29[T] highlighted the strain's untapped capacity to produce novel functional compounds with potential biotechnological applications. Designation as novel species Saccharopolyspora mangrovi sp. nov. (type strain S2-29[T] = JCM 34,548[T] = CGMCC 4.7716[T]) was warranted, expanding the known Saccharopolyspora diversity and ecology. The discovery of this mangrove-adapted strain advances understanding of the genus while highlighting an untapped source of chemical diversity.

RevDate: 2024-07-05

Wang Y, Ding K, Li H, et al (2024)

Biography of Vitis genomics: recent advances and prospective.

Horticulture research, 11(7):uhae128 pii:uhae128.

The grape genome is the basis for grape studies and breeding, and is also important for grape industries. In the last two decades, more than 44 grape genomes have been sequenced. Based on these genomes, researchers have made substantial progress in understanding the mechanism of biotic and abiotic resistance, berry quality formation, and breeding strategies. In addition, this work has provided essential data for future pangenome analyses. Apart from de novo assembled genomes, more than six whole-genome sequencing projects have provided datasets comprising almost 5000 accessions. Based on these datasets, researchers have explored the domestication and origins of the grape and clarified the gene flow that occurred during its dispersed history. Moreover, genome-wide association studies and other methods have been used to identify more than 900 genes related to resistance, quality, and developmental phases of grape. These findings have benefited grape studies and provide some basis for smart genomic selection breeding. Moreover, the grape genome has played a great role in grape studies and the grape industry, and the importance of genomics will increase sharply in the future.

RevDate: 2024-07-04

Sundaresan AK, Gangwar J, Murugavel A, et al (2024)

Complete genome sequence, phenotypic correlation and pangenome analysis of uropathogenic Klebsiella spp.

AMB Express, 14(1):78.

Urinary tract infections (UTI) by antibiotic resistant and virulent K. pneumoniae are a growing concern. Understanding the genome and validating the genomic profile along with pangenome analysis will facilitate surveillance of high-risk clones of K. pneumoniae to underpin management strategies toward early detection. The present study aims to correlate resistome with phenotypic antimicrobial resistance and virulome with pathogenicity in Klebsiella spp. The present study aimed to perform complete genome sequences of Klebsiella spp. and to analyse the correlation of resistome with phenotypic antimicrobial resistance and virulome with pathogenicity. To understand the resistome, pangenome and virulome in the Klebsiella spp, the ResFinder, CARD, IS Finder, PlasmidFinder, PHASTER, Roary, VFDB were used. The phenotypic susceptibility profiling identified the uropathogenic kp3 to exhibit multi drug resistance. The resistome and in vitro antimicrobial profiling showed concordance with all the tested antibiotics against the study strains. Hypermucoviscosity was not observed for any of the test isolates; this phenotypic character matches perfectly with the absence of rmpA and magA genes. To the best of our knowledge, this is the first report on the presence of ste, stf, stc and sti major fimbrial operons of Salmonella enterica serotype Typhimurium in K. pneumoniae genome. The study identifies the discordance of virulome and virulence in Klebsiella spp. The complete genome analysis and phenotypic correlation identify uropathogenic K. pneumoniae kp3 as a carbapenem-resistant and virulent pathogen. The Pangenome of K. pneumoniae was open suggesting high genetic diversity. Diverse K serotypes were observed. Sequence typing reveals the prevalence of K. pneumoniae high-risk clones in UTI catheterised patients. The study also highlights the concordance of resistome and in vitro susceptibility tests. Importantly, the study identifies the necessity of virulome and phenotypic virulence markers for timely diagnosis and immediate treatment for the management of high-risk K. pneumoniae clones.

RevDate: 2024-07-04

Li X, Dai X, He H, et al (2024)

A pan-TE map highlights transposable elements underlying domestication and agronomic traits in Asian rice.

National science review, 11(6):nwae188.

Transposable elements (TEs) are ubiquitous genomic components and hard to study due to being highly repetitive. Here we assembled 232 chromosome-level genomes based on long-read sequencing data. Coupling the 232 genomes with 15 existing assemblies, we developed a pan-TE map comprising both cultivated and wild Asian rice. We detected 177 084 high-quality TE variations and inferred their derived state using outgroups. We found TEs were one source of phenotypic variation during rice domestication and differentiation. We identified 1246 genes whose expression variation was associated with TEs but not single-nucleotide polymorphisms (SNPs), such as OsRbohB, and validated OsRbohB's relative expression activity using a dual-Luciferase (LUC) reporter assays system. Our pan-TE map allowed us to detect multiple novel loci associated with agronomic traits. Collectively, our findings highlight the contributions of TEs to domestication, differentiation and agronomic traits in rice, and there is massive potential for gene cloning and molecular breeding by the high-quality Asian pan-TE map we generated.

RevDate: 2024-07-04

Zhang B, Ren H, Wang X, et al (2024)

Comparative genomics analysis to explore the biodiversity and mining novel target genes of Listeria monocytogenes strains from different regions.

Frontiers in microbiology, 15:1424868.

As a common foodborne pathogen, infection with L. monocytogenes poses a significant threat to human life and health. The objective of this study was to employ comparative genomics to unveil the biodiversity and evolutionary characteristics of L. monocytogenes strains from different regions, screening for potential target genes and mining novel target genes, thus providing significant reference value for the specific molecular detection and therapeutic targets of L. monocytogenes strains. Pan-genomic analysis revealed that L. monocytogenes from different regions have open genomes, providing a solid genetic basis for adaptation to different environments. These strains contain numerous virulence genes that contribute to their high pathogenicity. They also exhibit relatively high resistance to phosphonic acid, glycopeptide, lincosamide, and peptide antibiotics. The results of mobile genetic elements indicate that, despite being located in different geographical locations, there is a certain degree of similarity in bacterial genome evolution and adaptation to specific environmental pressures. The potential target genes identified through pan-genomics are primarily associated with the fundamental life activities and infection invasion of L. monocytogenes, including known targets such as inlB, which can be utilized for molecular detection and therapeutic purposes. After screening a large number of potential target genes, we further screened them using hub gene selection methods to mining novel target genes. The present study employed eight different hub gene screening methods, ultimately identifying ten highly connected hub genes (bglF_1, davD, menE_1, tilS, dapX, iolC, gshAB, cysG, trpA, and hisC), which play crucial roles in the pathogenesis of L. monocytogenes. The results of pan-genomic analysis showed that L. monocytogenes from different regions exhibit high similarity in bacterial genome evolution. The PCR results demonstrated the excellent specificity of the bglF_1 and davD genes for L. monocytogenes. Therefore, the bglF_1 and davD genes hold promise as specific molecular detection and therapeutic targets for L. monocytogenes strains from different regions.

RevDate: 2024-07-03

Heumos S, Guarracino A, Schmelzle JM, et al (2024)

Pangenome graph layout by Path-Guided Stochastic Gradient Descent.

Bioinformatics (Oxford, England) pii:7705520 [Epub ahead of print].

MOTIVATION: The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge.

RESULTS: In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features.

AVAILABILITY: We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.

RevDate: 2024-07-02
CmpDate: 2024-07-02

Dahiya P, Kumar P, Rani S, et al (2024)

Comparative Genomic and Functional Analyses for Insights into Pantoea agglomerans Strains Adaptability in Diverse Ecological Niches.

Current microbiology, 81(8):254.

Pantoea agglomerans inhabit diverse ecological niches, ranging from epiphytes and endophytes in plants, body of animals, and occasionally in the human system. This multifaceted bacterium contributes substantially to plant growth promotion, stress resilience, and biocontrol but can also act as a pathogen to its host. The genetic determinants underlying these diverse functions remain largely unfathomed and to uncover this phenomenon, nineteen strains of Pantoea agglomerans were selected and analyzed. Genome-to-Genome Distance Calculator (GGDC) which uses the Genome Blast Distance Phylogeny (GBDP) technique to calculate digital DDH values. Phylogenetic analysis via Genome-to-Genome distance, Average Nucleotide Identity, and Amino Acid Identity calculation revealed that all strains belonged to the genus Pantoea. However, strain 33.1 had a lower value than the threshold for the same species delineation. Bacterial Pan Genome Analysis (BPGA) Pipeline and MinPath analysis revealed genetic traits associated with environmental resilience, such as oxidative stress, UV radiation, temperature extremes, and metabolism of distinct host-specific carbohydrates. Protein-protein interactome analysis illustrated osmotic stress proteins closely linked with core proteins, while heavy metal tolerance, nitrogen metabolism, and Type III and VI secretion systems proteins generally associated with pathogenicity formed a separate network, indicating strain-specific characteristics. These findings shed new light on the intricate genetic architecture of Pantoea agglomerans, revealing its adaptability to inhabit diverse niches and thrive in varied environments.

RevDate: 2024-07-02

Tong W, Yang D, Qiu S, et al (2024)

Relevance of genetic causes and environmental adaptation of Cronobacter spp. isolated from infant and follow-up formula production factories and retailed products in China: A 7-year period of continuous surveillance based on genome-wide analysis.

The Science of the total environment pii:S0048-9697(24)04516-9 [Epub ahead of print].

The possible contamination routes, environmental adaptation, and genetic basis of Cronobacter spp. in infant and follow-up formula production factories and retailed products in mainland China have been determined by laboratory studies and whole-genome comparative analysis in a 7-year nationwide continuous surveillance spanning from 2012 to 2018. The 2-year continuous multicenter surveillance of the production process (conducted in 2013 and 2014) revealed that the source of Cronobacter spp. in the dry-blending process was the raw dry ingredients and manufacturing environment (particularly in the vibro sieve and vacuum cleaner), while in the combined process, the main contamination source was identified as the packing room. It is important to note that, according to the contamination control knowledge obtained from the production process surveillance, the contamination rate of retail powdered infant formula (PIF) and follow-up formula (FUF) products in China decreased significantly from 2016 onward, after improving the hygiene management practices in factories. The prevalence of Cronobacter spp. in retailed PIF and FUF in China in 2018 was dramatically reduced from 1.55 % (61/3925, in 2012) to an average as low as 0.17 % (13/7655 in 2018). Phenotype determination and genomic analysis were performed on a total of 90 Cronobacter spp. isolates obtained from the surveillance. Of the 90 isolates, only two showed resistance to either cefazolin or cefoxitin. The multilocus sequence typing results revealed that C. sakazakii sequence type 1 (ST1), ST37, and C. malonaticus ST7 were the dominant sequence types (STs) collected from the production factories, while C. sakazakii ST1, ST4, ST64, and ST8 were the main STs detected in the retailed PIF and FUF nationwide. One C. sakazakii ST4 isolate (1.1 %, 1/90) had strong biofilm-forming ability and 13 isolates (14.4 %, 13/90) had weak biofilm-forming ability. Genomic analysis revealed that Cronobacter spp. have a relatively stable core-genome and an increasing pan-genome size. Plasmid IncFIB (pCTU3) was prevalent in this genus and some contained 14 antibacterial biocide- and metal-resistance genes (BMRGs) including copper, silver, and arsenic resistant genes. Plasmid IncN_1 was predicted to contain 6 ARGs. This is the first time that a multi-drug resistance IncN_1 type plasmid has been reported in Cronobacter spp. Genomic variations with respect to BMRGs, virulence genes, antimicrobial resistance genes (ARGs), and genes involved in biofilm formation were observed among strains of this genus. There were apparent differences in copies of bcsG and flgJ between the biofilm-forming group and non-biofilm-forming group, indicating that these two genes play key roles in biofilm formation. The findings of this study have improved our understanding of the contamination characteristics and genetic basis of Cronobacter spp. in PIF and FUF and their production environment in China and provide important guidance to reduce contamination with this pathogen during the production of PIF and FUF.

RevDate: 2024-07-02

Shchyogolev SY, Burygin GL, Dykman LA, et al (2024)

Phylogenetic and pangenomic analyses of members of the family Micrococcaceae related to a plant-growth-promoting rhizobacterium isolated from the rhizosphere of potato (Solanum tuberosum L.).

Vavilovskii zhurnal genetiki i selektsii, 28(3):308-316.

We report the results of taxonomic studies on members of the family Micrococcaceae that, according to the 16S rRNA, internal transcribed spacer 1 (ITS1), average nucleotide identity (ANI), and average amino acid identity (AAI) tests, are related to Kocuria rosea strain RCAM04488, a plant-growth-promoting rhizobacterium (PGPR) isolated from the rhizosphere of potato (Solanum tuberosum L.). In these studies, we used whole-genome phylogenetic tests and pangenomic analysis. According to the ANI > 95 % criterion, several known members of K. salina, K. polaris, and K. rosea (including K. rosea type strain ATCC 186T) that are related most closely to isolate RCAM04488 in the ITS1 test should be assigned to the same species with appropriate strain verification. However, these strains were isolated from strongly contrasting ecological and geographical habitats, which could not but affect their genotypes and phenotypes and which should be taken into account in evaluation of their systematic position. This contradiction was resolved by a pangenomic analysis, which showed that the strains differed strongly in the number of accessory and strain-specific genes determining their individuality and possibly their potential for adaptation to different ecological niches. Similar results were obtained in a full-scale AAI test against the UniProt database (about 250 million records), by using the AAI-profiler program and the proteome of K. rosea strain ATCC 186T as a query. According to the AAI > 65 % criterion, members of the genus Arthrobacter and several other genera belonging to the class Actinomycetes, with a very wide geographical and ecological range of sources of isolation, should be placed into the same genus as Kocuria. Within the paradigm with vertically inherited phylogenetic markers, this could be regarded as a signal for their following taxonomic reclassification. An important factor in this case may be the detailing of the gene composition of the strains and the taxonomic ratios resulting from analysis of the pangenomes of the corresponding clades.

RevDate: 2024-07-02
CmpDate: 2024-07-02

Niu J, Wang W, Wang Z, et al (2024)

Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing.

Genome biology, 25(1):171.

BACKGROUND: The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.

RESULTS: We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2N[v]S translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb (http://wheat.cau.edu.cn/WheatCNVb/), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.

CONCLUSIONS: The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.

RevDate: 2024-07-02
CmpDate: 2024-07-02

Lamkiewicz K, Barf LM, Sachse K, et al (2024)

RIBAP: a comprehensive bacterial core genome annotation pipeline for pangenome calculation beyond the species level.

Genome biology, 25(1):170.

Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.

RevDate: 2024-07-01

Rodriguez-Valera F, C Bellas (2024)

How Viruses Shape Microbial Plankton Microdiversity.

Annual review of marine science [Epub ahead of print].

One major conundrum of modern microbiology is the large pangenome (gene pool) present in microbes, which is much larger than those found in complex organisms such as humans. Here, we argue that this diversity of gene pools carried by different strains is maintained largely due to the control exercised by viral predation. Viruses maintain a high strain diversity through time that we describe as constant-diversity equilibrium, preventing the hoarding of resources by specific clones. Thus, viruses facilitate the release and degradation of dissolved organic matter in the ocean, which may lead to better ecosystem functioning by linking top-down to bottom-up control. By maintaining this equilibrium, viruses act as a key element of the adaptation of marine microbes to their environment and likely evolve as a single evolutionary unit.

RevDate: 2024-07-01

Kantor EJH, Robicheau BM, Tolman J, et al (2024)

Targeted metagenomics reveals pangenomic diversity of the nitroplast (UCYN-A) and its algal host plastid.

bioRxiv : the preprint server for biology pii:2024.06.19.599377.

UCYN-A (Cand. Atelocyanobacterium thalassa) has recently been recognized as a globally-distributed, early stage, nitrogen-fixing organelle (the 'nitroplast') of cyanobacterial origin present in select species of haptophyte algae (e.g., Braarudosphaera bigelowii). Although the nitroplast was recognized as the UCYN-A2 sublineage, it is yet to be confirmed in other sublineages of the algal/UCYN-A complex. We used water samples collected from Halifax Harbour (Bedford Basin, Nova Scotia, Canada) and the offshore Scotian Shelf to further our understanding of B. bigelowii and UCYN-A in the coastal Northwest Atlantic. Sequencing data revealed UCYN-A-associated haptophyte signatures and yielded near-complete metagenome-assembled genomes (MAGs) for UCYN-A1, UCYN-A4, and the plastid of the A4-associated haptophyte. Comparative genomics provided new insights into the pangenome of UCYN-A. The UCYN-A4 MAG is the first genome sequenced from this sublineage and shares ~85% identity with the UCYN-A2 nitroplast. Genes missing in the reduced genome of the nitroplast were also missing in the A4 MAG supporting its likely classification as a nitroplast as well. The UCYN-A1 MAG was found to be nearly 100% identical to the reference genome despite coming from different ocean basins. Time-series data paired with the recurrence of specific microbes in enrichment cultures gave insight into the microbes that frequently co-occur with the algal/UCYN-A complex (e.g., Pelagibacter ubique). Overall, our study expands knowledge of UCYN-A and its host across major ocean basins and investigates their co-occurring microbes in the coastal Northwest Atlantic (NWA), thereby facilitating future studies on the underpinnings of haptophyte-associated diazotrophy in the sea.

RevDate: 2024-07-01

Zhang P, Zhang B, Ji YY, et al (2024)

Cofitness network connectivity determines a fuzzy essential zone in open bacterial pangenome.

mLife, 3(2):277-290.

Most in silico evolutionary studies commonly assumed that core genes are essential for cellular function, while accessory genes are dispensable, particularly in nutrient-rich environments. However, this assumption is seldom tested genetically within the pangenome context. In this study, we conducted a robust pangenomic Tn-seq analysis of fitness genes in a nutrient-rich medium for Sinorhizobium strains with a canonical open pangenome. To evaluate the robustness of fitness category assignment, Tn-seq data for three independent mutant libraries per strain were analyzed by three methods, which indicates that the Hidden Markov Model (HMM)-based method is most robust to variations between mutant libraries and not sensitive to data size, outperforming the Bayesian and Monte Carlo simulation-based methods. Consequently, the HMM method was used to classify the fitness category. Fitness genes, categorized as essential (ES), advantage (GA), and disadvantage (GD) genes for growth, are enriched in core genes, while nonessential genes (NE) are over-represented in accessory genes. Accessory ES/GA genes showed a lower fitness effect than core ES/GA genes. Connectivity degrees in the cofitness network decrease in the order of ES, GD, and GA/NE. In addition to accessory genes, 1599 out of 3284 core genes display differential essentiality across test strains. Within the pangenome core, both shared quasi-essential (ES and GA) and strain-dependent fitness genes are enriched in similar functional categories. Our analysis demonstrates a considerable fuzzy essential zone determined by cofitness connectivity degrees in Sinorhizobium pangenome and highlights the power of the cofitness network in understanding the genetic basis of ever-increasing prokaryotic pangenome data.

RevDate: 2024-07-01

Socarras KM, Marino MC, Earl JP, et al (2024)

Characterization of the family-level Borreliaceae pan-genome and development of an episomal typing protocol.

Research square pii:rs.3.rs-4491589.

Background The Borreliaceae family includes many obligate parasitic bacterial species which are etiologically associated with a myriad of zoonotic borrelioses including Lyme disease and vector-borne relapsing fevers. Infections by the Borreliaceae are difficult to detect by both direct and indirect methods, often leading to delayed and missed diagnoses. Efforts to improve diagnoses center around the development of molecular diagnostics (MDx), but due to deep tissue sequestration of the causative spirochaetes and the lack of persistent bacteremias, even MDx assays suffer from a lack of sensitivity. Additionally, the highly extensive genomic heterogeneity among isolates, even within the same species, contributes to the lack of assay sensitivity as single target assays cannot provide universal coverage. This within-species heterogeneity is partly due to differences in replicon repertoires and genomic structures that have likely arisen to support the complex Borreliaceae lifecycle in which these parasites have to survive in multiple hosts each with unique immune responses. Results We constructed a Borreliaceae family-level pangenome and characterized the phylogenetic relationships among the constituent taxa which supports the recent taxonomy of splitting the family into at least two genera. Gene content profiles were created for the majority of the Borreliaceae replicons, providing for the first time their unambiguous molecular typing. Conclusion Our characterization of the Borreliaceae pan-genome supports the splitting of the former Borrelia genus into two genera and provides for the phylogenetic placement of several non-species designated isolates. Mining this family-level pangenome will enable precision diagnostics corresponding to gene content-driven clinical outcomes while also providing targets for interventions.

RevDate: 2024-06-29
CmpDate: 2024-06-29

Eynard SE, Klopp C, Canale-Tabet K, et al (2024)

The black honey bee genome: insights on specific structural elements and a first step towards pangenomes.

Genetics, selection, evolution : GSE, 56(1):51.

BACKGROUND: The honey bee reference genome, HAv3.1, was produced from a commercial line sample that was thought to have a largely dominant Apis mellifera ligustica genetic background. Apis mellifera mellifera, often referred to as the black bee, has a separate evolutionary history and is the original type in western and northern Europe. Growing interest in this subspecies for conservation and non-professional apicultural practices, together with the necessity of deciphering genome backgrounds in hybrids, triggered the necessity for a specific genome assembly. Moreover, having several high-quality genomes is becoming key for taking structural variations into account in pangenome analyses.

RESULTS: Pacific Bioscience technology long reads were produced from a single haploid black bee drone. Scaffolding contigs into chromosomes was done using a high-density genetic map. This allowed for re-estimation of the recombination rate, which was over-estimated in some previous studies due to mis-assemblies, which resulted in spurious inversions in the older reference genomes. The sequence continuity obtained was very high and the only limit towards continuous chromosome-wide sequences seemed to be due to tandem repeat arrays that were usually longer than 10 kb and that belonged to two main families, the 371 and 91 bp repeats, causing problems in the assembly process due to high internal sequence similarity. Our assembly was used together with the reference genome to genotype two structural variants by a pangenome graph approach with Graphtyper2. Genotypes obtained were either correct or missing, when compared to an approach based on sequencing depth analysis, and genotyping rates were 89 and 76% for the two variants.

CONCLUSIONS: Our new assembly for the Apis mellifera mellifera honey bee subspecies demonstrates the utility of multiple high-quality genomes for the genotyping of structural variants, with a test case on two insertions and deletions. It will therefore be an invaluable resource for future studies, for instance by including structural variants in GWAS. Having used a single haploid drone for sequencing allowed a refined analysis of very large tandem repeat arrays, raising the question of their function in the genome. High quality genome assemblies for multiple subspecies such as presented here, are crucial for emerging projects using pangenomes.

RevDate: 2024-06-28
CmpDate: 2024-06-28

Shivakumar VS, Ahmed OY, Kovaka S, et al (2024)

Sigmoni: classification of nanopore signal with a compressed pangenome index.

Bioinformatics (Oxford, England), 40(Supplement_1):i287-i296.

SUMMARY: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications.

Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni.

RevDate: 2024-06-27

Cohen ZP, Perkin LC, Wagner TA, et al (2024)

Nematode-resistance loci in Upland cotton genomes are associated with structural differences.

G3 (Bethesda, Md.) pii:7700213 [Epub ahead of print].

Reniform and root-knot nematode are two of the most destructive pests of conventional upland cotton, Gossypium hirsutum, L. and continue to be a major threat to cotton fiber production in semi-arid regions of the southern United States and Central America. Fortunately, naturally occurring tolerance to these nematodes has been identified in the Pima cotton species (G. barbadense) and several upland cotton varieties (G. hirsutum), which has led to a robust breeding program that has successfully introgressed and stacked these independent resistant traits into several upland cotton lineages with superior agronomic traits, e.g. BAR 32-30 and BARBREN-713. This work identifies the genomic variations of these nematode tolerant accessions by comparing their respective genomes to the susceptible, high-quality fiber producing parental line of this lineage: Phytogen 355 (PSC355). We discover several large genomic differences within marker regions that harbor putative resistance genes as well as expression mechanisms shared by the two resistant lines, with respect to the susceptible PSC355 parental line. This work emphasizes the utility of whole genome comparisons as a means of elucidating large and small nuclear differences by lineage and phenotype. .

RevDate: 2024-06-27

Raghuram V, Petit RA, Karol Z, et al (2024)

Average nucleotide identity-based Staphylococcus aureus strain grouping allows identification of strain-specific genes in the pangenome.

mSystems [Epub ahead of print].

UNLABELLED: Staphylococcus aureus causes both hospital- and community-acquired infections in humans worldwide. Due to the high incidence of infection, S. aureus is also one of the most sampled and sequenced pathogens today, providing an outstanding resource to understand variation at the bacterial subspecies level. We processed and downsampled 83,383 public S. aureus Illumina whole-genome shotgun sequences and 1,263 complete genomes to produce 7,954 representative substrains. Pairwise comparison of average nucleotide identity revealed a natural boundary of 99.5% that could be used to define 145 distinct strains within the species. We found that intermediate frequency genes in the pangenome (present in 10%-95% of genomes) could be divided into those closely linked to strain background ("strain-concentrated") and those highly variable within strains ("strain-diffuse"). Non-core genes had different patterns of chromosome location. Notably, strain-diffuse genes were associated with prophages; strain-concentrated genes were associated with the vSaβ genome island and rare genes (<10% frequency) concentrated near the origin of replication. Antibiotic resistance genes were enriched in the strain-diffuse class, while virulence genes were distributed between strain-diffuse, strain-concentrated, core, and rare classes. This study shows how different patterns of gene movement help create strains as distinct subspecies entities and provide insight into the diverse histories of important S. aureus functions.

IMPORTANCE: We analyzed the genomic diversity of Staphylococcus aureus, a globally prevalent bacterial species that causes serious infections in humans. Our goal was to build a genetic picture of the different strains of S. aureus and which genes may be associated with them. We reprocessed >84,000 genomes and subsampled to remove redundancy. We found that individual samples sharing >99.5% of their genome could be grouped into strains. We also showed that a portion of genes that are present in intermediate frequency in the species are strongly associated with some strains but completely absent from others, suggesting a role in strain specificity. This work lays the foundation for understanding individual gene histories of the S. aureus species and also outlines strategies for processing large bacterial genomic data sets.

RevDate: 2024-06-27

Burcham ZM (2024)

Comparative genomic analysis of an emerging Pseudomonadaceae member, Thiopseudomonas alkaliphila.

Microbiology spectrum [Epub ahead of print].

Thiopseudomonas alkaliphila, an organism recently classified within the Pseudomonadaceae family, has been detected in diverse sources such as human tissues, animal guts, industrial fermenters, and decomposition environments, suggesting a diverse ecological role. However, a large knowledge gap exists in how T. alkaliphila functions. In this comparative genomic analysis, adaptations indicative of habitat specificity among strains and genomic similarity to known opportunistic pathogens are revealed. Genomic investigation reveals a core metabolic utilization of multiple oxidative and non-oxidative catabolic pathways, suggesting adaptability to varied environments and carbon sources. The genomic repertoire of T. alkaliphila includes secondary metabolites, such as antimicrobials and siderophores, indicative of its involvement in microbial competition and resource acquisition. Additionally, the presence of transposases, prophages, plasmids, and Clustered Regularly Interspaced Short Palindromic Repeats-Cas systems in T. alkaliphila genomes suggests mechanisms for horizontal gene transfer and defense against viral predation. This comprehensive genomic analysis expands our understanding on the ecological functions, community interactions, and potential virulence of T. alkaliphila, while emphasizing its adaptability and diverse capabilities across environmental and host-associated ecosystems.IMPORTANCEAs the microbial world continues to be explored, new organisms will emerge with beneficial and/or pathogenetic impact. Thiopseudomonas alkaliphila is a species originally isolated from clinical human tissue and fluid samples but has not been attributed to disease. Since its classification, T. alkaliphila has been found in animal guts, animal waste, decomposing remains, and biogas fermentation reactors. This is the first study to provide an in-depth view of the metabolic potential of publicly available genomes belonging to this species through a comparative genomics and draft pangenome calculation approach. It was found that T. alkaliphila is metabolically versatile and likely adapts to diverse energy sources and environments, which may make it useful for bioremediation and in industrial settings. A range of virulence factors and antibiotic resistances were also detected, suggesting T. alkaliphila may operate as an undescribed opportunistic pathogen.

RevDate: 2024-06-27

Oles RE, Carrillo Terrazas M, Loomis LR, et al (2024)

Pangenome comparison of Bacteroides fragilis genomospecies unveils genetic diversity and ecological insights.

mSystems [Epub ahead of print].

UNLABELLED: Bacteroides fragilis is a Gram-negative commensal bacterium commonly found in the human colon, which differentiates into two genomospecies termed divisions I and II. Through a comprehensive collection of 694 B. fragilis whole genome sequences, we identify novel features distinguishing these divisions. Our study reveals a distinct geographic distribution with division I strains predominantly found in North America and division II strains in Asia. Additionally, division II strains are more frequently associated with bloodstream infections, suggesting a distinct pathogenic potential. We report differences between the two divisions in gene abundance related to metabolism, virulence, stress response, and colonization strategies. Notably, division II strains harbor more antimicrobial resistance (AMR) genes than division I strains. These findings offer new insights into the functional roles of division I and II strains, indicating specialized niches within the intestine and potential pathogenic roles in extraintestinal sites.

IMPORTANCE: Understanding the distinct functions of microbial species in the gut microbiome is crucial for deciphering their impact on human health. Classifying division II strains as Bacteroides fragilis can lead to erroneous associations, as researchers may mistakenly attribute characteristics observed in division II strains to the more extensively studied division I B. fragilis. Our findings underscore the necessity of recognizing these divisions as separate species with distinct functions. We unveil new findings of differential gene prevalence between division I and II strains in genes associated with intestinal colonization and survival strategies, potentially influencing their role as gut commensals and their pathogenicity in extraintestinal sites. Despite the significant niche overlap and colonization patterns between these groups, our study highlights the complex dynamics that govern strain distribution and behavior, emphasizing the need for a nuanced understanding of these microorganisms.

RevDate: 2024-06-27

Mather D, Vassos E, Sheedy J, et al (2024)

A Quantitative Trait Locus with a Major Effect on Root-Lesion Nematode Resistance in Barley.

Plants (Basel, Switzerland), 13(12): pii:plants13121663.

Although the root-lesion nematode Pratylenchus thornei is known to affect barley (Hordeum vulgare L.), there have been no reports on the genetic control of P. thornei resistance in barley. In this research, P. thornei resistance was assessed for a panel of 46 barley mapping parents and for two mapping populations (Arapiles/Franklin and Denar/Baudin). With both populations, a highly significant quantitative trait locus (QTL) was mapped at the same position on the long arm of chromosome 7H. Single-nucleotide polymorphisms (SNPs) in this region were anchored to an RGT Planet pan-genome assembly and assayed on the mapping parents and other barley varieties. The results indicate that Arapiles, Denar, RGT Planet and several other varieties likely have the same resistance gene on chromosome 7H. Marker assays reported here could be used to select for P. thornei resistance in barley breeding. Analysis of existing barley pan-genomic and pan-transcriptomic data provided a list of candidate genes along with information on the expression and differential expression of some of those genes in barley root tissue. Further research is required to identify a specific barley gene that affects root-lesion nematode resistance.

RevDate: 2024-06-26

Sierra P, R Durbin (2024)

Identification of transposable element families from pangenome polymorphisms.

Mobile DNA, 15(1):13.

BACKGROUND: Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility.

RESULTS: We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies.

CONCLUSIONS: Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.

RevDate: 2024-06-26

Casimiro-Ramos A, Bautista-Crescencio C, Vidal-Montiel A, et al (2024)

Comparative Genomics of the First Resistant Candida auris Strain Isolated in Mexico: Phylogenomic and Pan-Genomic Analysis and Mutations Associated with Antifungal Resistance.

Journal of fungi (Basel, Switzerland), 10(6): pii:jof10060392.

Candida auris is an emerging multidrug-resistant and opportunistic pathogenic yeast. Whole-genome sequencing analysis has defined five major clades, each from a distinct geographic region. The current study aimed to examine the genome of the C. auris 20-1498 strain, which is the first isolate of this fungus identified in Mexico. Based on whole-genome sequencing, the draft genome was found to contain 70 contigs. It had a total genome size of 12.86 Mbp, an N50 value of 1.6 Mbp, and an average guanine-cytosine (GC) content of 45.5%. Genome annotation revealed a total of 5432 genes encoding 5515 proteins. According to the genomic analysis, the C. auris 20-1498 strain belongs to clade IV (containing strains endemic to South America). Of the two genes (ERG11 and FKS1) associated with drug resistance in C. auris, a mutation was detected in K143R, a gene located in a mutation hotspot of ERG11 (lanosterol 14-α-demethylase), an antifungal drug target. The focus on whole-genome sequencing and the identification of mutations linked to the drug resistance of fungi could lead to the discovery of new therapeutic targets and new antifungal compounds.

RevDate: 2024-06-26

Ardalani O, Phaneuf PV, Mohite OS, et al (2024)

Pangenome reconstruction of Lactobacillaceae metabolism predicts species-specific metabolic traits.

mSystems [Epub ahead of print].

Strains across the Lactobacillaceae family form the basis for a trillion-dollar industry. Our understanding of the genomic basis for their key traits is fragmented, however, including the metabolism that is foundational to their industrial uses. Pangenome analysis of publicly available Lactobacillaceae genomes allowed us to generate genome-scale metabolic network reconstructions for 26 species of industrial importance. Their manual curation led to more than 75,000 gene-protein-reaction associations that were deployed to generate 2,446 genome-scale metabolic models. Cross-referencing genomes and known metabolic traits allowed for manual metabolic network curation and validation of the metabolic models. As a result, we provide the first pangenomic basis for metabolism in the Lactobacillaceae family and a collection of predictive computational metabolic models that enable a variety of practical uses.IMPORTANCELactobacillaceae, a bacterial family foundational to a trillion-dollar industry, is increasingly relevant to biosustainability initiatives. Our study, leveraging approximately 2,400 genome sequences, provides a pangenomic analysis of Lactobacillaceae metabolism, creating over 2,400 curated and validated genome-scale models (GEMs). These GEMs successfully predict (i) unique, species-specific metabolic reactions; (ii) niche-enriched reactions that increase organism fitness; (iii) essential media components, offering insights into the global amino acid essentiality of Lactobacillaceae; and (iv) fermentation capabilities across the family, shedding light on the metabolic basis of Lactobacillaceae-based commercial products. This quantitative understanding of Lactobacillaceae metabolic properties and their genomic basis will have profound implications for the food industry and biosustainability, offering new insights and tools for strain selection and manipulation.

RevDate: 2024-06-26

Zambounis A, Boutsika A, Gray N, et al (2024)

Pan-genome survey of Septoria pistaciarum, causal agent of Septoria leaf spot of pistachios, across three Aegean sub-regions of Greece.

Frontiers in microbiology, 15:1396760.

Septoria pistaciarum, a causal agent of Septoria leaf spot disease of pistachio, is a fungal pathogen that causes substantial losses in the cultivation, worldwide. This study describes the first pan-genome-based survey of this phytopathogen-comprising a total of 27 isolates, with 9 isolates each from 3 regional units of Greece (Pieria, Larissa and Fthiotida). The reference isolate (SPF8) assembled into a total of 43.1 Mb, with 38.6% contained within AT-rich regions of approximately 37.5% G:C. The genomes of the 27 isolates exhibited on average 42% gene-coding and 20% repetitive regions. The genomes of isolates from the southern Fthiotida region appeared to more diverged from each other than the other regions based on SNP-derived trees, and also contained isolates similar to both the Pieria and Larissa regions. In contrast, isolates of the Pieria and Larissa were less diverse and distinct from one another. Asexual reproduction appeared to be typical, with no MAT1-2 locus detected in any isolate. Genome-based prediction of infection mode indicated hemibiotrophic and saprotrophic adaptations, consistent with its long latent phase. Gene prediction and orthology clustering generated a pan-genome-wide gene set of 21,174 loci. A total of 59 ortholog groups were predicted to contain candidate effector proteins, with 36 (61%) of these either having homologs to known effectors from other species or could be assigned predicted functions from matches to conserved domains. Overall, effector prediction suggests that S. pistaciarum employs a combination of defensive effectors with roles in suppression of host defenses, and offensive effectors with a range of cytotoxic activities. Some effector-like ortholog groups presented as divergent versions of the same protein, suggesting region-specific adaptations may have occurred. These findings provide insights and future research directions in uncovering the pathogenesis and population dynamics of S. pistaciarum toward the efficient management of Septoria leaf spot of pistachio.

RevDate: 2024-06-25
CmpDate: 2024-06-25

Hämälä T, Moore C, Cowan L, et al (2024)

Impact of whole-genome duplications on structural variant evolution in Cochlearia.

Nature communications, 15(1):5377.

Polyploidy, the result of whole-genome duplication (WGD), is a major driver of eukaryote evolution. Yet WGDs are hugely disruptive mutations, and we still lack a clear understanding of their fitness consequences. Here, we study whether WGDs result in greater diversity of genomic structural variants (SVs) and how they influence evolutionary dynamics in a plant genus, Cochlearia (Brassicaceae). By using long-read sequencing and a graph-based pangenome, we find both negative and positive interactions between WGDs and SVs. Masking of recessive mutations due to WGDs leads to a progressive accumulation of deleterious SVs across four ploidal levels (from diploids to octoploids), likely reducing the adaptive potential of polyploid populations. However, we also discover putative benefits arising from SV accumulation, as more ploidy-specific SVs harbor signals of local adaptation in polyploids than in diploids. Together, our results suggest that SVs play diverse and contrasting roles in the evolutionary trajectories of young polyploids.

RevDate: 2024-06-25

Lypaczewski P, Chac D, Dunmire CN, et al (2024)

Vibrio cholerae O1 experiences mild bottlenecks through the gastrointestinal tract in some but not all cholera patients.

Microbiology spectrum [Epub ahead of print].

UNLABELLED: Vibrio cholerae O1 causes the diarrheal disease cholera, and the small intestine is the site of active infection. During cholera, cholera toxin is secreted from V. cholerae and induces a massive fluid influx into the small intestine, which causes vomiting and diarrhea. Typically, V. cholerae genomes are sequenced from bacteria passed in stool, but rarely from vomit, a fluid that may more closely represents the site of active infection. We hypothesized that V. cholerae O1 population bottlenecks along the gastrointestinal tract would result in reduced genetic variation in stool compared to vomit. To test this, we sequenced V. cholerae genomes from 10 cholera patients with paired vomit and stool samples. Genetic diversity was low in both vomit and stool, consistent with a single infecting population rather than coinfection with divergent V. cholerae O1 lineages. The amount of single-nucleotide variation decreased from vomit to stool in four patients, increased in two, and remained unchanged in four. The variation in gene presence/absence decreased between vomit and stool in eight patients and increased in two. Pangenome analysis of assembled short-read sequencing demonstrated that the toxin-coregulated pilus operon more frequently contained deletions in genomes from vomit compared to stool. However, these deletions were not detected by PCR or long-read sequencing, indicating that interpreting gene presence or absence patterns from short-read data alone may be incomplete. Overall, we found that V. cholerae O1 isolated from stool is genetically similar to V. cholerae recovered from the upper intestinal tract.

IMPORTANCE: Vibrio cholerae O1, the bacterium that causes cholera, is ingested in contaminated food or water and then colonizes the upper small intestine and is excreted in stool. Shed V. cholerae genomes from stool are usually studied, but V. cholerae isolated from vomit may be more representative of where V. cholerae colonizes in the upper intestinal epithelium. V. cholerae may experience bottlenecks, or large reductions in bacterial population sizes and genetic diversity, as it passes through the gut. Passage through the gut may select for distinct V. cholerae mutants that are adapted for survival and gut colonization. We did not find strong evidence for such adaptive mutations, and instead observed that passage through the gut results in modest reductions in V. cholerae genetic diversity, and only in some patients. These results fill a gap in our understanding of the V. cholerae life cycle, transmission, and evolution.

RevDate: 2024-06-25

Bhalla N, RK Nanda (2024)

Pangenome-wide association study reveals the selective absence of CRISPR genes (Rv2816c-19c) in drug-resistant Mycobacterium tuberculosis.

Microbiology spectrum [Epub ahead of print].

The presence of intermittently dispersed insertion sequences and transposases in the Mycobacterium tuberculosis (Mtb) genome makes intra-genome recombination events inevitable. Understanding their effect on the gene repertoires (GR), which may contribute to the development of drug-resistant Mtb, is critical. In this study, publicly available WGS data of clinical Mtb isolates (endemic region n = 2,601; non-endemic region n = 1,130) were de novo assembled, filtered, scaffolded into assemblies, and functionally annotated. Out of 2,601 Mtb WGS data sets from endemic regions, 2,184 (drug resistant/sensitive: 1,386/798) qualified as high quality. We identified 3,784 core genes, 123 softcore genes, 224 shell genes, and 762 cloud genes in the pangenome of Mtb clinical isolates from endemic regions. Sets of 33 and 39 genes showed positive and negative associations (P < 0.01) with drug resistance status, respectively. Gene ontology clustering showed compromised immunity to phages and impaired DNA repair in drug-resistant Mtb clinical isolates compared to the sensitive ones. Multidrug efflux pump repressor genes (Rv3830c and Rv3855c) and CRISPR genes (Rv2816c-19c) were absent in the drug-resistant Mtb. A separate WGS data analysis of drug-resistant Mtb clinical isolates from the Netherlands (n = 1130) also showed the absence of CRISPR genes (Rv2816c-17c). This study highlights the role of CRISPR genes in drug resistance development in Mtb clinical isolates and helps in understanding its evolutionary trajectory and as useful targets for diagnostics development.IMPORTANCEThe results from the present Pan-GWAS study comparing gene sets in drug-resistant and drug-sensitive Mtb clinical isolates revealed intricate presence-absence patterns of genes encoding DNA-binding proteins having gene regulatory as well as DNA modification and DNA repair roles. Apart from the genes with known functions, some uncharacterized and hypothetical genes that seem to have a potential role in drug resistance development in Mtb were identified. We have been able to extrapolate many findings of the present study with the existing literature on the molecular aspects of drug-resistant Mtb, further strengthening the relevance of the results presented in this study.

RevDate: 2024-06-25

Mahmoud FM, Pritsch K, Siani R, et al (2024)

Comparative genomic analysis of strain Priestia megaterium B1 reveals conserved potential for adaptation to endophytism and plant growth promotion.

Microbiology spectrum [Epub ahead of print].

In our study, we aimed to explore the genomic and phenotypic traits of Priestia megaterium strain B1, which was isolated from root material of healthy apple plants, to adapt to the endophytic lifestyle and promote plant growth. We identified putative genes encoding proteins involved in chemotaxis, flagella biosynthesis, biofilm formation, secretory systems, detoxification, transporters, and transcription regulation. Furthermore, B1 exhibited both swarming and swimming motilities, along with biofilm formation. Both genomic and physiological analyses revealed the potential of B1 to promote plant growth through the production of indole-3-acetic acid and siderophores, as well as the solubilization of phosphate and zinc. To deduce potential genomic features associated with endophytism across members of P. megaterium strains, we conducted a comparative genomic analysis involving 27 and 31 genomes of strains recovered from plant and soil habitats, respectively, in addition to our strain B1. Our results indicated a closed pan genome and comparable genome size of strains from both habitats, suggesting a facultative host association and adaptive lifestyle to both habitats. Additionally, we performed a sparse Partial Least Squares Discriminant Analysis to infer the most discriminative functional features of the two habitats based on Pfam annotation. Despite the distinctive clustering of both groups, functional enrichment analysis revealed no significant enrichment of any Pfam domain in both habitats. Furthermore, when assessing genetic elements related to adaptation to endophytism in each individual strain, we observed their widespread presence among strains from both habitats. Moreover, all members displayed potential genetic elements for promoting plant growth.IMPORTANCEBoth genomic and phenotypic analyses yielded valuable insights into the capacity of P. megaterium B1 to adapt to the plant niche and enhance its growth. The comparative genomic analysis revealed that P. megaterium members, whether derived from soil or plant sources, possess the essential genetic machinery for interacting with plants and enhancing their growth. The conservation of these traits across various strains of this species extends its potential application as a bio-stimulant in diverse environments. This significance also applies to strain B1, particularly regarding its application to enhance the growth of plants facing apple replant disease conditions.

RevDate: 2024-06-25

Parmigiani L, Garrison E, Stoye J, et al (2024)

Panacus: fast and exact pangenome growth and core size estimation.

bioRxiv : the preprint server for biology pii:2024.06.11.598418.

MOTIVATION: Using a single linear reference genome poses a limitation to exploring the full genomic diversity of a species. The release of a draft human pangenome underscores the increasing relevance of pangenomics to overcome these limitations. Pangenomes are commonly represented as graphs, which can represent billions of base pairs of sequence. Presently, there is a lack of scalable software able to perform key tasks on pangenomes, such as quantifying universally shared sequence across genomes (the core genome) and measuring the extent of genomic variability as a function of sample size (pangenome growth).

RESULTS: We introduce Panacus (pangenome-abacus), a tool designed to rapidly perform these tasks and visualize the results in interactive plots. Panacus can process GFA files, the accepted standard for pangenome graphs, and is able to analyze a human pangenome graph with 110 million nodes in less than one hour.

AVAILABILITY: Panacus is implemented in Rust and is published as Open Source software under the MIT license. The source code and documentation are available at https://github.com/marschall-lab/panacus . Panacus can be installed via Bioconda at https://bioconda.github.io/recipes/panacus/README.html .

CONTACT: Luca Parmigiani (luca.parmigiani@uni-bielefeld.de), Daniel Doerr (daniel.doerr@hhu.de).

RevDate: 2024-06-25

Trouche B, Schrieke H, Duron O, et al (2024)

Wolbachia populations across organs of individual Culex pipiens: highly conserved intra-individual core pangenome with inter-individual polymorphisms.

ISME communications, 4(1):ycae078.

Wolbachia is a maternally inherited intracellular bacterium that infects a wide range of arthropods including mosquitoes. The endosymbiont is widely used in biocontrol strategies due to its capacity to modulate arthropod reproduction and limit pathogen transmission. Wolbachia infections in Culex spp. are generally assumed to be monoclonal but the potential presence of genetically distinct Wolbachia subpopulations within and between individual organs has not been investigated using whole genome sequencing. Here we reconstructed Wolbachia genomes from ovary and midgut metagenomes of single naturally infected Culex pipiens mosquitoes from Southern France to investigate patterns of intra- and inter-individual differences across mosquito organs. Our analyses revealed a remarkable degree of intra-individual conservancy among Wolbachia genomes from distinct organs of the same mosquito both at the level of gene presence-absence signal and single-nucleotide polymorphisms (SNPs). Yet, we identified several synonymous and non-synonymous substitutions between individuals, demonstrating the presence of some level of genomic heterogeneity among Wolbachia that infect the same C. pipiens field population. Overall, the absence of genetic heterogeneity within Wolbachia populations in a single individual confirms the presence of a dominant Wolbachia that is maintained under strong purifying forces of evolution.

RevDate: 2024-06-21
CmpDate: 2024-06-21

Buschi E, Dell'Anno A, Tangherlini M, et al (2024)

Resistance to freezing conditions of endemic Antarctic polychaetes is enhanced by cryoprotective proteins produced by their microbiome.

Science advances, 10(25):eadk9117.

The microbiome plays a key role in the health of all metazoans. Whether and how the microbiome favors the adaptation processes of organisms to extreme conditions, such as those of Antarctica, which are incompatible with most metazoans, is still unknown. We investigated the microbiome of three endemic and widespread species of Antarctic polychaetes: Leitoscoloplos geminus, Aphelochaeta palmeri, and Aglaophamus trissophyllus. We report here that these invertebrates contain a stable bacterial core dominated by Meiothermus and Anoxybacillus, equipped with a versatile genetic makeup and a unique portfolio of proteins useful for coping with extremely cold conditions as revealed by pangenomic and metaproteomic analyses. The close phylosymbiosis between Meiothermus and Anoxybacillus and these Antarctic polychaetes indicates a connection with their hosts that started in the past to support holobiont adaptation to the Antarctic Ocean. The wide suite of bacterial cryoprotective proteins found in Antarctic polychaetes may be useful for the development of nature-based biotechnological applications.

RevDate: 2024-06-21
CmpDate: 2024-06-21

Kaur J, Verma H, Kaur J, et al (2024)

In Silico Analysis of the Phylogenetic and Physiological Characteristics of Sphingobium indicum B90A: A Hexachlorocyclohexane-Degrading Bacterium.

Current microbiology, 81(8):233.

The study focuses on the in silico genomic characterization of Sphingobium indicum B90A, revealing a wealth of genes involved in stress response, carbon monoxide oxidation, β-carotene biosynthesis, heavy metal resistance, and aromatic compound degradation, suggesting its potential as a bioremediation agent. Furthermore, genomic adaptations among nine Sphingomonad strains were explored, highlighting shared core genes via pangenome analysis, including those related to the shikimate pathway and heavy metal resistance. The majority of genes associated with aromatic compound degradation, heavy metal resistance, and stress response were found within genomic islands across all strains. Sphingobium indicum UT26S exhibited the highest number of genomic islands, while Sphingopyxis alaskensis RB2256 had the maximum fraction of its genome covered by genomic islands. The distribution of lin genes varied among the strains, indicating diverse genetic responses to environmental pressures. Additionally, in silico evidence of horizontal gene transfer (HGT) between plasmids pSRL3 and pISP3 of the Sphingobium and Sphingomonas genera, respectively, has been provided. The manuscript offers novel insights into strain B90A, highlighting its role in horizontal gene transfer and refining evolutionary relationships among Sphingomonad strains. The discovery of stress response genes and the czcABCD operon emphasizes the potential of Sphingomonads in consortia development, supported by genomic island analysis.

RevDate: 2024-06-20
CmpDate: 2024-06-20

Liang Y, Dikow RB, Su X, et al (2024)

Comparative genomics of the primary endosymbiont Buchnera aphidicola in aphid hosts and their coevolutionary relationships.

BMC biology, 22(1):137.

BACKGROUND: Coevolution between modern aphids and their primary obligate, bacterial endosymbiont, Buchnera aphidicola, has been previously reported at different classification levels based on molecular phylogenetic analyses. However, the Buchnera genome remains poorly understood within the Rhus gall aphids.

RESULTS: We assembled the complete genome of the endosymbiont Buchnera in 16 aphid samples, representing 13 species in all six genera of Rhus gall aphids by shotgun genome skimming method. We compared the newly assembled genomes with those from GenBank to comprehensively investigate patterns of coevolution between the bacteria Buchnera and their aphid hosts. Buchnera genomes were mostly collinear, and the pan-genome contained 684 genes, in which the core genome contained 256 genes with some lineages having large numbers of tandem gene duplications. There has been substantial gene-loss in each Buchnera lineage. We also reconstructed the phylogeny for Buchnera and their host aphids, respectively, using 72 complete genomes of Buchnera, along with the complete mitochondrial genomes and three nuclear genes of 31 corresponding host aphid accessions. The cophylogenetic test demonstrated significant coevolution between these two partner groups at individual, species, generic, and tribal levels.

CONCLUSIONS: Buchnera exhibits very high levels of genomic sequence divergence but relative stability in gene order. The relationship between the symbionts Buchnera and its aphid hosts shows a significant coevolutionary pattern and supports complexity of the obligate symbiotic relationship.

RevDate: 2024-06-19

Yang ZD, Kuo HY, Hsieh PW, et al (2024)

Efficient Construction and Utilization of k-Ordered FM-indexes with kISS for Ultra-Fast Read Mapping in Large Genomes.

Bioinformatics (Oxford, England) pii:7696319 [Epub ahead of print].

MOTIVATION: The Full-text index in Minute space (FM-index) is a memory-efficient data structure widely used in bioinformatics for solving the fundamental pattern-matching task of searching for short patterns within a long reference. With the demand for short query patterns, the k-ordered concept has been proposed for FM-indexes. However, few construction algorithms in the state of the art fully exploit this idea to achieve significant speedups in the pan-genome era.

RESULTS: We introduce the k-ordered Induced Suffix Sorting (kISS) for efficient construction and utilization of k-ordered FM-indexes. We present an algorithmic workflow for building k-ordered suffix arrays, incorporating two novel strategies to improve time and memory efficiency. We also demonstrate the compatibility of integrating k-ordered FM-indexes with locate operations in FMtree. Experiments show that kISS can improve the construction time, and the generated k-ordered suffix array can also be applied to FMtree without any additional in computation or memory usage.

AVAILABILITY: https://github.com/jhhung/kISS.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2024-06-19

Sreya PK, Hari Naga Papa Rao A, Suresh G, et al (2024)

Genomic and functional insights of a mucin foraging Rhodopirellula halodulae sp. nov.

Systematic and applied microbiology, 47(4):126523 pii:S0723-2020(24)00037-7 [Epub ahead of print].

Nine novel strains were obtained from various algal and seagrass samples. The analysis of the 16S rRNA gene-based phylogenetic tree revealed monophyletic placement of all novel strains within the Rhodopirellula genus. The type strain was identified as JC737[T], which shared 99.1 % 16S rRNA gene sequence identity with Rhodopirellula baltica SH1[T], while strain JC740 was designated as an additional strain. The genome sizes of strains JC737[T] and JC740 were 6.6 and 6.7 Mb, respectively, and the G + C content was 56.2 %. The strains cladded distinctly in the phylogenomic tree, and the ANI and dDDH values of the strain JC737[T] were 75.8-76.1 % and 20.8-21.3 %, respectively, in comparison to other Rhodopirellula members. The strain demonstrated a versatile degradation capability, exhibiting a diverse array of complex polysaccharides, including mucin which had not been previously identified within the members of the phylum Planctomycetota. The phylogenomic, pan-genomic, morphological, physiological, and genomic characterization of the strain lead to the proposal to describe the strain as Rhodopirellula halodulae sp. nov.

RevDate: 2024-06-19

Sung K, Nawaz M, Park M, et al (2024)

Whole-Genome Sequence Analysis of Antibiotic Resistance, Virulence, and Plasmid Dynamics in Multidrug-Resistant E. coli Isolates from Imported Shrimp.

Foods (Basel, Switzerland), 13(11):.

We analyzed antimicrobial resistance and virulence traits in multidrug-resistant (MDR) E. coli isolates obtained from imported shrimp using whole-genome sequences (WGSs). Antibiotic resistance profiles were determined phenotypically. WGSs identified key characteristics, including their multilocus sequence type (MLST), serotype, virulence factors, antibiotic resistance genes, and mobile elements. Most of the isolates exhibited resistance to gentamicin, streptomycin, ampicillin, chloramphenicol, nalidixic acid, ciprofloxacin, tetracycline, and trimethoprim/sulfamethoxazole. Multilocus sequence type (MLST), serotype, average nucleotide identity (ANI), and pangenome analysis showed high genomic similarity among isolates, except for EC15 and ECV01. The EC119 plasmid contained a variety of efflux pump genes, including those encoding the acid resistance transcriptional activators (gadE, gadW, and gadX), resistance-nodulation-division-type efflux pumps (mdtE and mdtF), and a metabolite, H1 symporter (MHS) family major facilitator superfamily transporter (MNZ41_23075). Virulence genes displayed diversity, particularly EC15, whose plasmids carried genes for adherence (faeA and faeC-I), invasion (ipaH and virB), and capsule (caf1A and caf1M). This comprehensive analysis illuminates antimicrobial resistance, virulence, and plasmid dynamics in E. coli from imported shrimp and has profound implications for public health, emphasizing the need for continued surveillance and research into the evolution of these important bacterial pathogens.

RevDate: 2024-06-17

Jackson TK, C Rhode (2024)

Comparative genomics of dusky kob (Argyrosomus japonicus, Sciaenidae) conspecifics: Evidence for speciation and the genetic mechanisms underlying traits.

Journal of fish biology [Epub ahead of print].

Dusky kob (Argyrosomus japonicus) is a commercially important finfish, indigenous to South Africa, Australia, and China. Previous studies highlighted differences in genetic composition, life history, and morphology of the species across geographic regions. A draft genome sequence of 0.742 Gb (N50 = 5.49 Mb; BUSCO completeness = 97.8%) and 22,438 predicted protein-coding genes was generated for the South African (SA) conspecific. A comparison with the Chinese (CN) conspecific revealed a core set of 32,068 orthologous protein clusters across both genomes. The SA genome exhibited 440 unique clusters compared to 1928 unique clusters in the CN genome. Transportation and immune response processes were overrepresented among the SA accessory genome, whereas the CN accessory genome was enriched for immune response, DNA transposition, and sensory detection (FDR-adjusted p < 0.01). These unique clusters may represent an adaptive component of the species' pangenome that could explain population divergence due to differential environmental specialisation. Furthermore, 700 single-copy orthologues (SCOs) displayed evidence of positive selection between the SA and CN genomes, and globally these genomes shared only 92% similarity, suggesting they might be distinct species. These genes primarily play roles in metabolism and digestion, illustrating the evolutionary pathways that differentiate the species. Understanding these genomic mechanisms underlying adaptation and evolution within and between species provides valuable insights into growth and maturation of kob, traits that are particularly relevant to commercial aquaculture.

RevDate: 2024-06-17

Lin X, Hu T, Wu Z, et al (2024)

Isolation of potentially novel species expands the genomic and functional diversity of Lachnospiraceae.

iMeta, 3(2):e174.

The Lachnospiraceae family holds promise as a source of next-generation probiotics, yet a comprehensive delineation of its diversity is lacking, hampering the identification of suitable strains for future applications. To address this knowledge gap, we conducted an in-depth genomic and functional analysis of 1868 high-quality genomes, combining data from public databases with our new isolates. This data set represented 387 colonization-selective species-level clusters, of which eight genera represented multilineage clusters. Pan-genome analysis, single-nucleotide polymorphism (SNP) identification, and probiotic functional predictions revealed that species taxonomy, habitats, and geography together shape the functional diversity of Lachnospiraceae. Moreover, analyses of associations with atherosclerotic cardiovascular disease (ACVD) and inflammatory bowel disease (IBD) indicated that several strains of potentially novel Lachnospiraceae species possess the capacity to reduce the abundance of opportunistic pathogens, thereby imparting potential health benefits. Our findings shed light on the untapped potential of novel species enabling knowledge-based selection of strains for the development of next-generation probiotics holding promise for improving human health and disease management.

RevDate: 2024-06-14

Chanket W, Pipatthana M, Sangphukieo A, et al (2024)

The complete catalog of antimicrobial resistance secondary active transporters in Clostridioides difficile: evolution and drug resistance perspective.

Computational and structural biotechnology journal, 23:2358-2374 pii:S2001-0370(24)00176-4.

Secondary active transporters shuttle substrates across eukaryotic and prokaryotic membranes, utilizing different electrochemical gradients. They are recognized as one of the antimicrobial efflux pumps among pathogens. While primary active transporters within the genome of C. difficile 630 have been completely cataloged, the systematical study of secondary active transporters remains incomplete. Here, we not only identify secondary active transporters but also disclose their evolution and role in drug resistance in C. difficile 630. Our analysis reveals that C. difficile 630 carries 147 secondary active transporters belonging to 27 (super)families. Notably, 50 (34%) of them potentially contribute to antimicrobial resistance (AMR). AMR-secondary active transporters are structurally classified into five (super)families: the p-aminobenzoyl-glutamate transporter (AbgT), drug/metabolite transporter (DMT) superfamily, major facilitator (MFS) superfamily, multidrug and toxic compound extrusion (MATE) family, and resistance-nodulation-division (RND) family. Surprisingly, complete RND genes found in C. difficile 630 are likely an evolutionary leftover from the common ancestor with the diderm. Through protein structure comparisons, we have potentially identified six novel AMR-secondary active transporters from DMT, MATE, and MFS (super)families. Pangenome analysis revealed that half of the AMR-secondary transporters are accessory genes, which indicates an important role in adaptive AMR function rather than innate physiological homeostasis. Gene expression profile firmly supports their ability to respond to a wide spectrum of antibiotics. Our findings highlight the evolution of AMR-secondary active transporters and their integral role in antibiotic responses. This marks AMR-secondary active transporters as interesting therapeutic targets to synergize with other antibiotic activity.

RevDate: 2024-06-14

Li W, Lin X, Liang H, et al (2024)

Genomic and functional diversity of the human-derived isolates of Faecalibacterium.

Frontiers in microbiology, 15:1379500.

INTRODUCTION: Faecalibacterium is one of the most abundant bacteria in the gut microbiota of healthy adults, highly regarded as a next-generation probiotic. However, the functions of Faecalibacterium genomes from cultured strains and the distribution of different species in populations may differ among different sources.

METHODS: We here performed an extensive analysis of pan-genomes, functions, and safety evaluation of 136 Faecalibacterium genomes collected from 10 countries.

RESULTS: The genomes are clustered into 11 clusters, with only five of them were characterized and validly nomenclated. Over 80% of the accessory genes and unique genes of Faecalibacterium are found with unknown function, which reflects the importance of expanding the collection of Faecalibacterium strains. All the genomes have the potential to produce acetic acid and butyric acid. Nine clusters of Faecalibacterium are found significantly enriched in the healthy individuals compared with patients with type II diabetes..

DISCUSSION: This study provides a comprehensive view of genomic characteristic and functions and of culturable Faecalibacterium bacterium from human gut, and enables clinical advances in the future.

RevDate: 2024-06-14

Logachev A, Kanapin A, Rozhmina T, et al (2024)

Pangenomics of flax fungal parasite Fusarium oxysporum f. sp. lini.

Frontiers in plant science, 15:1383914.

To assess the genomic diversity of Fusarium oxysporum f. sp. lini strains and compile a comprehensive gene repertoire, we constructed a pangenome using 13 isolates from four different clonal lineages, each exhibiting distinct levels of virulence. Syntenic analyses of two selected genomes revealed significant chromosomal rearrangements unique to each genome. A comprehensive examination of both core and accessory pangenome content and diversity points at an open genome state. Additionally, Gene Ontology (GO) enrichment analysis indicated that non-core pangenome genes are associated with pathogen recognition and immune signaling. Furthermore, the Folini pansecterome, encompassing secreted proteins critical for fungal pathogenicity, primarily consists of three functional classes: effector proteins, CAZYmes, and proteases. These three classes account for approximately 3.5% of the pangenome. Each functional class within the pansecterome was meticulously annotated and characterized with respect to pangenome category distribution, PFAM domain frequency, and strain virulence assessment. This analysis revealed that highly virulent isolates have specific types of PFAM domains that are exclusive to them. Upon examining the repertoire of SIX genes known for virulence in other formae speciales, it was found that all isolates had a similar gene content except for two, which lacked SIX genes entirely.

RevDate: 2024-06-14

Tariq A, Meng M, Jiang X, et al (2024)

In-depth exploration of the genomic diversity in tea varieties based on a newly constructed pangenome of Camellia sinensis.

The Plant journal : for cell and molecular biology [Epub ahead of print].

Tea, one of the most widely consumed beverages globally, exhibits remarkable genomic diversity in its underlying flavour and health-related compounds. In this study, we present the construction and analysis of a tea pangenome comprising a total of 11 genomes, with a focus on three newly sequenced genomes comprising the purple-leaved assamica cultivar "Zijuan", the temperature-sensitive sinensis cultivar "Anjibaicha" and the wild accession "L618" whose assemblies exhibited excellent quality scores as they profited from latest sequencing technologies. Our analysis incorporates a detailed investigation of transposon complement across the tea pangenome, revealing shared patterns of transposon distribution among the studied genomes and improved transposon resolution with long read technologies, as shown by long terminal repeat (LTR) Assembly Index analysis. Furthermore, our study encompasses a gene-centric exploration of the pangenome, exploring the genomic landscape of the catechin pathway with our study, providing insights on copy number alterations and gene-centric variants, especially for Anthocyanidin synthases. We constructed a gene-centric pangenome by structurally and functionally annotating all available genomes using an identical pipeline, which both increased gene completeness and allowed for a high functional annotation rate. This improved and consistently annotated gene set will allow for a better comparison between tea genomes. We used this improved pangenome to capture the core and dispensable gene repertoire, elucidating the functional diversity present within the tea species. This pangenome resource might serve as a valuable resource for understanding the fundamental genetic basis of traits such as flavour, stress tolerance, and disease resistance, with implications for tea breeding programmes.

RevDate: 2024-06-13

Doukbi E, Ancel P, Dutour A, et al (2024)

Human epicardial adipose tissue contains innate and adaptive lymphoid cells and a higher proportion of innate type 2 lymphoid cells compared to other adipose tissues.

Annales d'endocrinologie pii:S0003-4266(24)00064-7 [Epub ahead of print].

IMPORTANCE: Epicardial adipose tissue (EAT) is a biologically active organ surrounding myocardium and coronary arteries that has been associated with coronary artery disease (CAD) and atrial fibrillation. Previous work has shown that EAT exhibits beige features.

OBJECTIVE: Our objective was to determine whether the stromal vascular fraction of the human EAT contains innate or adaptive lymphoid cells compared to thoracic subcutaneous (thSAT), visceral abdominal (VAT) and subcutaneous abdominal (abSAT).

PARTICIPANTS: New pangenomic microarray analysis was performed on previous transcriptomic dataset using significance analysis of microarray and ingenuity pathway analysis (n=41) to identify specific immune signature and its link with browning genes. EAT, thSAT, VAT and abSAT samples from explanted patients with severe cardiomyopathies and multi-organ donor patients (n=17) were used for flow cytometry (FC) immunophenotyping assay. Patients were on average 55±16 years-old; 47% had hypertension and 6% CAD. Phenotypic adaptive and innate immune profiles were performed using a TBNK panel and a specific ILC1-2-3 panel including CD127, CD117, CRTH2 (CD294) and activation markers such as CD25 and CD69.

RESULTS: Transcriptomic analysis showed a significant positive correlation between the TH2 immune pathway (IL-4, IL-5, IL-13, IL-25, IL-33) and browning genes (UCP-1, PRDM16, TMEM26, CITED1, TBX1) in EAT versus thSAT (R=0.82, P<0.0001). Regarding adaptive immune cells, a preponderance of CD8T cells, a contingent of CD4T cells, and a few B cells were observed in all ATs (P<0.0001). In innate lymphoid cells (ILCs), an increase was observed in visceral ATs (i.e. EAT; VAT 35±8ILCs/g of tissue) compared to their subcutaneous counterpart (i.e. thSAT+abSAT: 8±3 ILCs/g of AT, P=0.002), with a difference in the proportion of the 3 subtypes of ILCs (ILC1>ILC3>ILC2). In addition, we observed an increase in EAT-ILC2 compared to other ATs and almost all these EAT-ILC2 expressed CD69 and/or CD25 activation markers (99.75±0.16%; P<0.0001). We also observed more NKs in EAT and VAT (1520±71 cells/g of AT) than in SATs (562±17 cells/g of AT); P=0.01.

CONCLUSION: This is the first study to provide a comparison between innate and adaptive lymphoid cells in human epicardial versus abdominal or thoracic adipose tissues. Further studies are ongoing to decipher whether these cells could be involved in EAT beiging.

TRIAL REGISTRATION: CODECOH No. DC-2021-4518 The French agency of biomedicine PFS21-005.

RevDate: 2024-06-13

Wang K, Hua G, Li J, et al (2024)

Duck pan-genome reveals two transposon insertions caused bodyweight enlarging and white plumage phenotype formation during evolution.

iMeta, 3(1):e154 pii:IMT2154.

Structural variations (SVs) are a major source of domestication and improvement traits. We present the first duck pan-genome constructed using five genome assemblies capturing ∼40.98 Mb new sequences. This pan-genome together with high-depth sequencing data (∼46.5×) identified 101,041 SVs, of which substantial proportions were derived from transposable element (TE) activity. Many TE-derived SVs anchoring in a gene body or regulatory region are linked to duck's domestication and improvement. By combining quantitative genetics with molecular experiments, we, for the first time, unraveled a 6945 bp Gypsy insertion as a functional mutation of the major gene IGF2BP1 associated with duck bodyweight. This Gypsy insertion, to our knowledge, explains the largest effect on bodyweight among avian species (27.61% of phenotypic variation). In addition, we also examined another 6634 bp Gypsy insertion in MITF intron, which triggers a novel transcript of MITF, thereby contributing to the development of white plumage. Our findings highlight the importance of using a pan-genome as a reference in genomics studies and illuminate the impact of transposons in trait formation and livestock breeding.

RevDate: 2024-06-13

Liu D, Zhang Y, Fan G, et al (2022)

IPGA: A handy integrated prokaryotes genome and pan-genome analysis web service.

iMeta, 1(4):e55 pii:IMT255.

Pan-genomics is one of the most powerful means to study genomic variation and obtain a sketch of genes within a defined clade of species. Though there are a lot of computational tools to achieve this, an integrated framework to evaluate their performance and offer the best choice to users has never been achieved. To ease the process of large-scale prokaryotic genome analysis, we introduce Integrated Prokaryotes Genome and pan-genome Analysis (IPGA), a one-stop web service to analyze, compare, and visualize pan-genome as well as individual genomes, that rids users of installing any specific tools. IPGA features a scoring system that helps users to evaluate the reliability of pan-genome profiles generated by different packages. Thus, IPGA can help users ascertain the profiling method that is most suitable for their data set for the following analysis. In addition, IPGA integrates several downstream comparative analysis and genome analysis modules to make users achieve diverse targets.

RevDate: 2024-06-13

Hu H, Tan Y, Li C, et al (2022)

StrainPanDA: Linked reconstruction of strain composition and gene content profiles via pangenome-based decomposition of metagenomic data.

iMeta, 1(3):e41 pii:IMT241.

Microbial strains of variable functional capacities coexist in microbiomes. Current bioinformatics methods of strain analysis cannot provide the direct linkage between strain composition and their gene contents from metagenomic data. Here we present Strain-level Pangenome Decomposition Analysis (StrainPanDA), a novel method that uses the pangenome coverage profile of multiple metagenomic samples to simultaneously reconstruct the composition and gene content variation of coexisting strains in microbial communities. We systematically validate the accuracy and robustness of StrainPanDA using synthetic data sets. To demonstrate the power of gene-centric strain profiling, we then apply StrainPanDA to analyze the gut microbiome samples of infants, as well as patients treated with fecal microbiota transplantation. We show that the linked reconstruction of strain composition and gene content profiles is critical for understanding the relationship between microbial adaptation and strain-specific functions (e.g., nutrient utilization and pathogenicity). Finally, StrainPanDA has minimal requirements for computing resources and can be scaled to process multiple species in a community in parallel. In short, StrainPanDA can be applied to metagenomic data sets to detect the association between molecular functions and microbial/host phenotypes to formulate testable hypotheses and gain novel biological insights at the strain or subspecies level.

RevDate: 2024-06-12

Jia M, Zhu S, Xue MY, et al (2024)

Single-cell transcriptomics across 2,534 microbial species reveals functional heterogeneity in the rumen microbiome.

Nature microbiology [Epub ahead of print].

Deciphering the activity of individual microbes within complex communities and environments remains a challenge. Here we describe the development of microbiome single-cell transcriptomics using droplet-based single-cell RNA sequencing and pangenome-based computational analysis to characterize the functional heterogeneity of the rumen microbiome. We generated a microbial genome database (the Bovine Gastro Microbial Genome Map) as a functional reference map for the construction of a single-cell transcriptomic atlas of the rumen microbiome. The atlas includes 174,531 microbial cells and 2,534 species, of which 172 are core active species grouped into 12 functional clusters. We detected single-cell-level functional roles, including a key role for Basfia succiniciproducens in the carbohydrate metabolic niche of the rumen microbiome. Furthermore, we explored functional heterogeneity and reveal metabolic niche trajectories driven by biofilm formation pathway genes within B. succiniciproducens. Our results provide a resource for studying the rumen microbiome and illustrate the diverse functions of individual microbial cells that drive their ecological niche stability or adaptation within the ecosystem.

RevDate: 2024-06-12
CmpDate: 2024-06-12

Yu D, Stothard P, NF Neumann (2024)

Emergence of potentially disinfection-resistant, naturalized Escherichia coli populations across food- and water-associated engineered environments.

Scientific reports, 14(1):13478.

The Escherichia coli species is comprised of several 'ecotypes' inhabiting a wide range of host and natural environmental niches. Recent studies have suggested that novel naturalized ecotypes have emerged across wastewater treatment plants and meat processing facilities. Phylogenetic and multilocus sequence typing analyses clustered naturalized wastewater and meat plant E. coli strains into two main monophyletic clusters corresponding to the ST635 and ST399 sequence types, with several serotypes identified by serotyping, potentially representing distinct lineages that have naturalized across wastewater treatment plants and meat processing facilities. This evidence, taken alongside ecotype prediction analyses that distinguished the naturalized strains from their host-associated counterparts, suggests these strains may collectively represent a novel ecotype that has recently emerged across food- and water-associated engineered environments. Interestingly, pan-genomic analyses revealed that the naturalized strains exhibited an abundance of biofilm formation, defense, and disinfection-related stress resistance genes, but lacked various virulence and colonization genes, indicating that their naturalization has come at the cost of fitness in the original host environment.

RevDate: 2024-06-12

Deschner D, Voordouw MJ, Fernando C, et al (2024)

Identification of genetic markers of resistance to macrolide class antibiotics in Mannheimia haemolytica isolates from a Saskatchewan feedlot.

Applied and environmental microbiology [Epub ahead of print].

Mannheimia haemolytica is a major contributor to bovine respiratory disease (BRD), which causes substantial economic losses to the beef industry, and there is an urgent need for rapid and accurate diagnostic tests to provide evidence for treatment decisions and support antimicrobial stewardship. Diagnostic sequencing can provide information about antimicrobial resistance genes in M. haemolytica more rapidly than conventional diagnostics. Realizing the full potential of diagnostic sequencing requires a comprehensive understanding of the genetic markers of antimicrobial resistance. We identified genetic markers of resistance in M. haemolytica to macrolide class antibiotics commonly used for control of BRD. Genome sequences were determined for 99 M. haemolytica isolates with six different susceptibility phenotypes collected over 2 years from a feedlot in Saskatchewan, Canada. Known macrolide resistance genes estT, msr(E), and mph(E) were identified in most resistant isolates within predicted integrative and conjugative elements (ICEs). ICE sequences lacking antibiotic resistance genes were detected in 10 of 47 susceptible isolates. No resistance-associated polymorphisms were detected in ribosomal RNA genes, although previously unreported mutations in the L22 and L23 ribosomal proteins were identified in 12 and 27 resistant isolates, respectively. Pangenome analysis led to the identification of 79 genes associated with resistance to gamithromycin, of which 95% (75 of 79) had no functional annotation. Most of the observed phenotypic resistance was explained by previously identified antibiotic resistance genes, although resistance to the macrolides gamithromycin and tulathromycin was not explained in 39 of 47 isolates, demonstrating the need for continued surveillance for novel determinants of macrolide resistance.IMPORTANCEBovine respiratory disease is the costliest disease of beef cattle in North America and the most common reason for injectable antibiotic use in beef cattle. Metagenomic sequencing offers the potential to make economically significant reductions in turnaround time for diagnostic information for evidence-based selection of antibiotics for use in the feedlot. The success of diagnostic sequencing depends on a comprehensive catalog of antimicrobial resistance genes and other genome features associated with reduced susceptibility. We analyzed the genome sequences of isolates of Mannheimia haemolytica, a major bovine respiratory disease pathogen, and identified both previously known and novel genes associated with reduced susceptibility to macrolide class antimicrobials. These findings reinforce the need for ongoing surveillance for markers of antimicrobial resistance to support improved diagnostics and antimicrobial stewardship.

RevDate: 2024-06-11

Li Q, Qiao X, Li L, et al (2024)

Haplotype-resolved T2T genome assemblies and pangenome graph of pear reveal diverse patterns of allele-specific expression and genomic basis of fruit quality traits.

Plant communications pii:S2590-3462(24)00317-1 [Epub ahead of print].

Hybrid crops often exhibit increased yield and greater resilience, yet the genomic mechanism(s) underlying hybrid vigor or heterosis remain unclear, hindering our ability to predict the expression of phenotypic traits in hybrid breeding. Here, we generated haplotype-resolved T2T genome assemblies of two pear hybrid varieties 'Yuluxiangli' (YLX) and 'Hongxiangsu' (HXS) that share the same maternal parent, but differ in their paternal parents. We then used these assemblies to explore genome-scale landscape of allele-specific expression and create a pangenome graph for pear. Allele specific expression (ASE) was observed for close to 6000 genes in both hybrid cultivars. A subset of ASEGs related to fruit quality including sugar, organic acid and cuticular wax were identified, suggesting their important contributions to heterosis. Specifically, Ma1, a gene regulating fruit acidity, was absent in the paternal haplotypes of HXS and YLX. Further, a pangenome graph was built based on our assemblies and eight published pear genomes. Resequencing data for 139 cultivated pear genotypes (including 97 genotypes sequenced here) were subsequently aligned to the pangenome graph, revealing numerous SV hotspots and selective sweeps during pear diversification. As predicted, the Ma1 allele was found to be absent in varieties with low organic acid content, an association that was functionally validated by Ma1 over-expression in pear fruit and calli. Overall, the results unraveled contributions of allele-specific expression to heterosis involving fruit quality and provided a robust pangenome reference for high resolution allele discovery and association mapping.

RevDate: 2024-06-10

Roy A, Swetha RG, Basu S, et al (2024)

Integrating pan-genome and reverse vaccinology to design multi-epitope vaccine against Herpes simplex virus type-1.

3 Biotech, 14(7):176.

UNLABELLED: Herpes simplex virus type-1 (HSV-1), the etiological agent of sporadic encephalitis and recurring oral (sometimes genital) infections in humans, affects millions each year. The evolving viral genome reduces susceptibility to existing antivirals and, thus, necessitates new therapeutic strategies. Immunoinformatics strategies have shown promise in designing novel vaccine candidates in the absence of a clinically licensed vaccine to prevent HSV-1. However, to encourage clinical translation, the HSV-1 pan-genome was integrated with the reverse-vaccinology pipeline for rigorous screening of universal vaccine candidates. Viral targets were screened from 104 available complete genomes. Among 364 proteins, envelope glycoprotein D being an outer membrane protein with a high antigenicity score (> 0.4) and solubility (> 0.6) was selected for epitope screening. A total of 17 T-cell and 4 B-cell epitopes with highly antigenic, immunogenic, non-toxic properties and high global population coverage were identified. Furthermore, 8 vaccine constructs were designed using different combinations of epitopes and suitable linkers. VC-8 was identified as the most potential vaccine candidate regarding chemical and structural stability. Molecular docking revealed high interactive affinity (low binding energy: - 56.25 kcal/mol) of VC-8 with the target elicited by firm intermolecular H-bonds, salt-bridges, and hydrophobic interactions, which was validated with simulations. Compatibility of the vaccine candidate to be expressed in pET-29(a) + plasmid was established by in silico cloning studies. Immune simulations confirmed the potential of VC-8 to trigger robust B-cell, T-cell, cytokine, and antibody-mediated responses, thereby suggesting a promising candidate for the future of HSV-1 prevention.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-024-04022-6.

RevDate: 2024-06-10

Depuydt L, Renders L, Van de Vyver S, et al (2024)

b-move: faster bidirectional character extensions in a run-length compressed index.

bioRxiv : the preprint server for biology pii:2024.05.30.596587.

UNLABELLED: Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.'s r-index and Nishimoto and Tabei's move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.'s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient bidirectional character extensions in run-length compressed space. It achieves bidirectional character extensions up to 8 times faster than the br-index, closing the performance gap with FM-index-based alternatives, while maintaining the br-index's favorable memory characteristics. For example, all available complete E. coli genomes on NCBI's RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Thus, b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.

FUNDING: Lore Depuydt : PhD Fellowship FR (1117322N), Research Foundation - Flanders (FWO) Luca Renders : PhD Fellowship SB (1SE7822N), Research Foundation - Flanders (FWO) Travis Gagie : NSERC Discovery Grant RGPIN-07185-2020 to Travis Gagie and NIH grant R01HG011392 to Ben Langmead.

RevDate: 2024-06-10

Park A, D Koslicki (2024)

Prokrustean Graph: A substring index supporting rapid enumeration across a range of k-mer sizes.

bioRxiv : the preprint server for biology pii:2023.11.21.568151.

UNLABELLED: Despite the widespread adoption of k -mer-based methods in bioinformatics, a fundamental question persists: How can we quantify the influence of k sizes in applications? With no universal answer available, choosing an optimal k size or employing multiple k sizes remains application-specific, arbitrary, and computationally expensive. The assessment of the primary parameter k is typically empirical, based on the end products of applications which pass complex processes of genome analysis, comparison, assembly, alignment, and error correction. The elusiveness of the problem stems from a limited understanding of the transitions of k -mers with respect to k sizes. Indeed, there is considerable room for improving both practice and theory by exploring k -mer-specific quantities across multiple k sizes. This paper introduces an algorithmic framework built upon a novel substring representation: the Pro k rustean graph. The primary functionality of this framework is to extract various k -mer-based quantities across a range of k sizes, but its computational complexity depends only on maximal repeats, not on the k range. For example, counting maximal unitigs of de Bruijn graphs for k = 10, …, 100 takes just a few seconds with a Pro k rustean graph built on a read set of gigabases in size. This efficiency sets the graph apart from other substring indices, such as the FM-index, which are normally optimized for string pattern searching rather than for depicting the substring structure across varying lengths. However, the Pro k rustean graph is expected to close this gap, as it can be built using the extended Burrows-Wheeler Transform (eBWT) in a space-efficient manner. The framework is particularly useful in pangenome and metagenome analyses, where the demand for precise multi- k approaches is increasing due to the complex and diverse nature of the information being managed. We introduce four applications implemented with the framework that extract key quantities actively utilized in modern pangenomics and metagenomics. Code implementing our data structure and algorithms (along with correctness tests) is available at https://github.com/KoslickiLab/prokrustean .

ACM SUBJECT CLASSIFICATION: 2012Applied computing → Computational biology.


SUPPLEMENTARY MATERIAL: https://github.com/KoslickiLab/prokrustean.

RevDate: 2024-06-08

Tkalec KI, Hayes AJ, Lim KS, et al (2024)

Glycan-Tailored Glycoproteomic Analysis Reveals Serine is the Sole Residue Subjected to O-Linked Glycosylation in Acinetobacter baumannii.

Journal of proteome research [Epub ahead of print].

Protein glycosylation is a ubiquitous process observed across all domains of life. Within the human pathogen Acinetobacter baumannii, O-linked glycosylation is required for virulence; however, the targets and conservation of glycosylation events remain poorly defined. In this work, we expand our understanding of the breadth and site specificity of glycosylation within A. baumannii by demonstrating the value of strain specific glycan electron-transfer/higher-energy collision dissociation (EThcD) triggering for bacterial glycoproteomics. By coupling tailored EThcD-triggering regimes to complementary glycopeptide enrichment approaches, we assessed the observable glycoproteome of three A. baumannii strains (ATCC19606, BAL062, and D1279779). Combining glycopeptide enrichment techniques including ion mobility (FAIMS), metal oxide affinity chromatography (titanium dioxide), and hydrophilic interaction liquid chromatography (ZIC-HILIC), as well as the use of multiple proteases (trypsin, GluC, pepsin, and thermolysis), we expand the known A. baumannii glycoproteome to 33 unique glycoproteins containing 42 glycosylation sites. We demonstrate that serine is the sole residue subjected to glycosylation with the substitution of serine for threonine abolishing glycosylation in model glycoproteins. An A. baumannii pan-genome built from 576 reference genomes identified that serine glycosylation sites are highly conserved. Combined this work expands our knowledge of the conservation and site specificity of A. baumannii O-linked glycosylation.

RevDate: 2024-06-06

Bouzid N, Bugada M, Pissaloux D, et al (2024)

An orbital perivascular epithelioid cell tumor (PEComa) in a 9-year-old boy: Case report and review of the literature.

Journal francais d'ophtalmologie, 47(7):104215 pii:S0181-5512(24)00160-8 [Epub ahead of print].

Perivascular epithelioid cell tumors (PEComas) are a family of benign neoplasms characterized by smooth muscle and melanocytic differentiation. Orbital cases are rare. A 9-year-old male presented with a slowly growing orbital mass. Magnetic resonance imaging (MRI) revealed a well-defined orbital mass without intracranial extension. The microscopic appearance of the complete resection specimen showed large nests of epithelioid cells with wide cytoplasm containing melanin pigment and round to oval nuclei with mild cytonuclear atypia and low mitotic activity. Immunohistochemistry was positive for HMB45 and negative for melanA, smooth muscle actin, desmin and S-100 protein. Pangenomic RNA-sequencing identified an in-frame NONO-TFE3 rearrangement, and clustering data showed that the tumor's gene expression profile was grouped with other previously studied PEComas. A diagnosis of orbital pigmented PEComa with uncertain malignant potential associated with a NONO-TFE3 rearrangement was made. There was no recurrence after 1 year of follow-up.

RevDate: 2024-06-06

Klepa MS, diCenzo GC, M Hungria (2024)

Comparative genomic analysis of Bradyrhizobium strains with natural variability in the efficiency of nitrogen fixation, competitiveness, and adaptation to stressful edaphoclimatic conditions.

Microbiology spectrum [Epub ahead of print].

Bradyrhizobium is known for fixing atmospheric nitrogen in symbiosis with agronomically important crops. This study focused on two groups of strains, each containing eight natural variants of the parental strains, Bradyrhizobium japonicum SEMIA 586 (=CNPSo 17) or Bradyrhizobium diazoefficiens SEMIA 566 (=CNPSo 10). CNPSo 17 and CNPSo 10 were used as commercial inoculants for soybean crops in Brazil at the beginning of the crop expansion in the southern region in the 1960s-1970s. Variants derived from these parental strains were obtained in the late 1980s through a strain selection program aimed at identifying elite strains adapted to a new cropping frontier in the central-western Cerrado region, with a higher capacity of biological nitrogen fixation (BNF) and competitiveness. Here, we aimed to detect genetic variations possibly related to BNF, competitiveness for nodule occupancy, and adaptation to the stressful conditions of the Brazilian Cerrado soils. High-quality genome assemblies were produced for all strains. The core genome phylogeny revealed that strains of each group are closely related, as confirmed by high average nucleotide identity values. However, variants accumulated divergences resulting from horizontal gene transfer, genomic rearrangements, and nucleotide polymorphisms. The B. japonicum group presented a larger pangenome and a higher number of nucleotide polymorphisms than the B. diazoefficiens group, possibly due to its longer adaptation time to the Cerrado soil. Interestingly, five strains of the B. japonicum group carry two plasmids. The genetic variability found in both groups is discussed considering the observed differences in their BNF capacity, competitiveness for nodule occupancy, and environmental adaptation.IMPORTANCEToday, Brazil is a global leader in the study and use of biological nitrogen fixation with soybean crops. As Brazilian soils are naturally void of soybean-compatible bradyrhizobia, strain selection programs were established, starting with foreign isolates. Selection searched for adaptation to the local edaphoclimatic conditions, higher efficiency of nitrogen fixation, and strong competitiveness for nodule occupancy. We analyzed the genomes of two parental strains of Bradyrhizobium japonicum and Bradyrhizobium diazoefficiens and eight variant strains derived from each parental strain. We detected two plasmids in five strains and several genetic differences that might be related to adaptation to the stressful conditions of the soils of the Brazilian Cerrado biome. We also detected genetic variations in specific regions that may impact symbiotic nitrogen fixation. Our analysis contributes to new insights into the evolution of Bradyrhizobium, and some of the identified differences may be applied as genetic markers to assist strain selection programs.

RevDate: 2024-06-06

Cui Y, Lin Y, Wei H, et al (2024)

Identification of salt tolerance-associated presence-absence variations in the OsMADS56 gene through the integration of DEGs dataset and eQTL analysis.

RevDate: 2024-06-05
CmpDate: 2024-06-05

Mukhopadhyay S, Singh M, Ghosh MM, et al (2024)

Comparative Genomics and Characterization of Shigella flexneri Isolated from Urban Wastewater.

Microbes and environments, 39(2):.

Shigella species are a group of highly transmissible Gram-negative pathogens. Increasing reports of infection with extensively drug-resistant varieties of this stomach bug has convinced the World Health Organization to prioritize Shigella for novel therapeutic interventions. We herein coupled the whole-genome sequencing of a natural isolate of Shigella flexneri with a pangenome ana-lysis to characterize pathogen genomics within this species, which will provide us with an insight into its existing genomic diversity and highlight the root causes behind the emergence of quick vaccine escape variants. The isolated novel strain of S. flexneri contained ~4,500 protein-coding genes, 57 of which imparted resistance to antibiotics. A comparative pan-genomic ana-lysis revealed genomic variability of ~64%, the shared conservation of core genes in central metabolic processes, and the enrichment of unique/accessory genes in virulence and defense mechanisms that contributed to much of the observed antimicrobial resistance (AMR). A pathway ana-lysis of the core genome mapped 22 genes to 2 antimicrobial resistance pathways, with the bulk coding for multidrug efflux pumps and two component regulatory systems that are considered to work synergistically towards the development of resistance phenotypes. The prospective evolvability of Shigella species as witnessed by the marked difference in genomic content, the strain-specific essentiality of unique/accessory genes, and the inclusion of a potent resistance mechanism within the core genome, strengthens the possibility of novel serotypes emerging in the near future and emphasizes the importance of tracking down genomic diversity in drug/vaccine design and AMR governance.

RevDate: 2024-06-05
CmpDate: 2024-06-05

Islam MM, Kolling GL, Glass EM, et al (2024)

Model-driven characterization of functional diversity of Pseudomonas aeruginosa clinical isolates with broadly representative phenotypes.

Microbial genomics, 10(6):.

Pseudomonas aeruginosa is a leading cause of infections in immunocompromised individuals and in healthcare settings. This study aims to understand the relationships between phenotypic diversity and the functional metabolic landscape of P. aeruginosa clinical isolates. To better understand the metabolic repertoire of P. aeruginosa in infection, we deeply profiled a representative set from a library of 971 clinical P. aeruginosa isolates with corresponding patient metadata and bacterial phenotypes. The genotypic clustering based on whole-genome sequencing of the isolates, multilocus sequence types, and the phenotypic clustering generated from a multi-parametric analysis were compared to each other to assess the genotype-phenotype correlation. Genome-scale metabolic network reconstructions were developed for each isolate through amendments to an existing PA14 network reconstruction. These network reconstructions show diverse metabolic functionalities and enhance the collective P. aeruginosa pangenome metabolic repertoire. Characterizing this rich set of clinical P. aeruginosa isolates allows for a deeper understanding of the genotypic and metabolic diversity of the pathogen in a clinical setting and lays a foundation for further investigation of the metabolic landscape of this pathogen and host-associated metabolic differences during infection.

RevDate: 2024-06-05
CmpDate: 2024-06-05

Xue Z, Zhou A, Zhu X, et al (2024)

NIPT-PG: empowering non-invasive prenatal testing to learn from population genomics through an incremental pan-genomic approach.

Briefings in bioinformatics, 25(4):.

Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.

RevDate: 2024-06-04
CmpDate: 2024-06-04

Juby S, Soumya P, Jayachandran K, et al (2024)

Morphological, Metabolomic and Genomic Evidences on Drought Stress Protective Functioning of the Endophyte Bacillus safensis Ni7.

Current microbiology, 81(7):209.

The metabolomic and genomic characterization of an endophytic Bacillus safensis Ni7 was carried out in this study. This strain has previously been isolated from the xerophytic plant Nerium indicum L. and reported to enhance the drought tolerance in Capsicum annuum L. seedlings. The effects of drought stress on the morphology, biofilm production, and metabolite production of B. safensis Ni7 are analyzed in the current study. From the results obtained, the organism was found to have multiple strategies such as aggregation and clumping, robust biofilm production, and increased production of surfactin homologues under the drought induced condition when compared to non-stressed condition. Further the whole genome sequencing (WGS) based analysis has demonstrated B. safensis Ni7 to have a genome size of 3,671,999 bp, N50 value of 3,527,239, and a mean G+C content of 41.58%. Interestingly the organism was observed to have the presence of various stress-responsive genes (13, 20U, 16U,160, 39, 17M, 18, 26, and ctc) and genes responsible for surfactin production (srfAA, srfAB, srfAC, and srfAD), biofilm production (epsD, epsE, epsF, epsG, epsH, epsI, epsK, epsL, epsM, epsN, and pel), chemotaxis (cheB_1, cheB_2, cheB_3, cheW_1, cheW_2 cheR, cheD, cheC, cheA, cheY, cheV, and cheB_4), flagella synthesis (flgG_1, flgG_2, flgG_3, flgC, and flgB) as supportive to the drought tolerance. Besides these, the genes responsible for plant growth promotion (PGP), including the genes for nitrogen (nasA, nasB, nasC, nasD, and nasE) and sulfur assimilation (cysL_1&L_2, cysI) and genes for phosphate solubilization (phoA, phoP_1& phoP_2, and phoR) could also be predicted. Along with the same, the genes for catalase, superoxide dismutase, protein homeostasis, cellular fitness, osmoprotectants production, and protein folding could also be predicted from its WGS data. Further pan-genome analysis with plant associated B. safensis strains available in the public databases revealed B. safensis Ni7 to have the presence of a total of 5391 gene clusters. Among these, 3207 genes were identified as core genes, 954 as shell genes and 1230 as cloud genes. This variation in gene content could be taken as an indication of evolution of strains of Bacillus safensis as per specific conditions and hence in the case of B. safensis Ni7 its role in habitat adaptation of plant is well expected. This diversity in endophytic bacterial genes may attribute its role to support the plant system to cope up with stress conditions. Overall, the study provides genomic evidence on Bacillus safensis Ni7 as a stress alleviating microbial partner in plants.

RevDate: 2024-06-04
CmpDate: 2024-06-04

Shrestha AMS, Gonzales MEM, Ong PCL, et al (2024)

RicePilaf: a post-GWAS/QTL dashboard to integrate pangenomic, coexpression, regulatory, epigenomic, ontology, pathway, and text-mining information to provide functional insights into rice QTLs and GWAS loci.

GigaScience, 13:.

BACKGROUND: As the number of genome-wide association study (GWAS) and quantitative trait locus (QTL) mappings in rice continues to grow, so does the already long list of genomic loci associated with important agronomic traits. Typically, loci implicated by GWAS/QTL analysis contain tens to hundreds to thousands of single-nucleotide polmorphisms (SNPs)/genes, not all of which are causal and many of which are in noncoding regions. Unraveling the biological mechanisms that tie the GWAS regions and QTLs to the trait of interest is challenging, especially since it requires collating functional genomics information about the loci from multiple, disparate data sources.

RESULTS: We present RicePilaf, a web app for post-GWAS/QTL analysis, that performs a slew of novel bioinformatics analyses to cross-reference GWAS results and QTL mappings with a host of publicly available rice databases. In particular, it integrates (i) pangenomic information from high-quality genome builds of multiple rice varieties, (ii) coexpression information from genome-scale coexpression networks, (iii) ontology and pathway information, (iv) regulatory information from rice transcription factor databases, (v) epigenomic information from multiple high-throughput epigenetic experiments, and (vi) text-mining information extracted from scientific abstracts linking genes and traits. We demonstrate the utility of RicePilaf by applying it to analyze GWAS peaks of preharvest sprouting and genes underlying yield-under-drought QTLs.

CONCLUSIONS: RicePilaf enables rice scientists and breeders to shed functional light on their GWAS regions and QTLs, and it provides them with a means to prioritize SNPs/genes for further experiments. The source code, a Docker image, and a demo version of RicePilaf are publicly available at https://github.com/bioinfodlsu/rice-pilaf.

RevDate: 2024-06-03

Hwang S, Brown NK, Ahmed OY, et al (2024)

MEM-based pangenome indexing for k -mer queries.

bioRxiv : the preprint server for biology pii:2024.05.20.595044.

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k -mers and de Bruijn graphs are limited to answering questions at a specific substring length k . We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k -mer presence/absence (membership queries) and that count the number of genomes containing k -mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8 × smaller than a comparable KMC3 index and 11.4 × smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5x faster than other approaches. MEMO's small index size, lack of k -mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

RevDate: 2024-06-03

Barcia-Cruz R, Balboa S, Lema A, et al (2024)

Comparative genomics of Vibrio toranzoniae strains.

Research square pii:rs.3.rs-4360386.

Vibrio toranzoniae is a marine bacterium belonging to the Splendidus clade, originally isolated from healthy clams in Galicia (NW Spain). Its isolation from different hosts and seawater indicated two lifestyles and wide geographical distribution. The aim of the present study was to determine the differences at genome level among strains, as well as to determine their phylogeny. For this purpose, whole genomes were sequenced by different technologies and the resulting sequences corrected. Genomes were annotated and compared with different online tools. Furthermore, the study of core and pan genome was examined, and the phylogeny was inferred. The content of the core genome ranged from 2,953 to 2,766 genes and that of the pangenome from 6,278 to 6,132, depending on the tool used. The comparison revealed that although the strains shared certain homology, with DDH values ranging from 77.10 to 82.30 and values of OrthoANI higher than 97%,notable differences were found related to motility, capsule synthesis, iron acquisition system or mobile genetic elements. The phylogenetic analysis of the core genome did not reveal a differentiation of the strains according to their lifestyle, but that of the pangenome pointed out certain geographical isolation in the same growing area. The study led to a reclassification of some isolates formerly described as V. toranzoniae and manifested the importance of cured deposited sequences to proper phylogenetic assignment.

RevDate: 2024-06-03
CmpDate: 2024-06-03

Qiu L, Chirman D, Clark JR, et al (2024)

Vaccines against extraintestinal pathogenic Escherichia coli (ExPEC): progress and challenges.

Gut microbes, 16(1):2359691.

The emergence of antimicrobial resistance (AMR) is a principal global health crisis projected to cause 10 million deaths annually worldwide by 2050. While the Gram-negative bacteria Escherichia coli is commonly found as a commensal microbe in the human gut, some strains are dangerously pathogenic, contributing to the highest AMR-associated mortality. Strains of E. coli that can translocate from the gastrointestinal tract to distal sites, called extraintestinal E. coli (ExPEC), are particularly problematic and predominantly afflict women, the elderly, and immunocompromised populations. Despite nearly 40 years of clinical trials, there is still no vaccine against ExPEC. One reason for this is the remarkable diversity in the ExPEC pangenome across pathotypes, clades, and strains, with hundreds of genes associated with pathogenesis including toxins, adhesins, and nutrient acquisition systems. Further, ExPEC is intimately associated with human mucosal surfaces and has evolved creative strategies to avoid the immune system. This review summarizes previous and ongoing preclinical and clinical ExPEC vaccine research efforts to help identify key gaps in knowledge and remaining challenges.

RevDate: 2024-06-02

Ramos B, MV Cunha (2024)

The mobilome of Staphylococcus aureus from wild ungulates reveals epidemiological links at the animal-human interface.

Environmental pollution (Barking, Essex : 1987) pii:S0269-7491(24)00955-2 [Epub ahead of print].

Staphylococcus aureus thrives at animal-human-environment interfaces. A large-scale work from our group indicated that antimicrobial resistance (AMR) in commensal S. aureus strains from wild ungulates is associated with agricultural land cover and livestock farming, raising the hypothesis that AMR genes in wildlife strains may originate from different hosts, namely via exchange of mobile genetic elements (MGE). In this work, we generate the largest available dataset of S. aureus draft genomes from wild ungulates in Portugal and explore their mobilome, which can determine important traits such as AMR, virulence, and host specificity, to understand MGE exchange. Core genome multi-locus sequence typing based on 98 newly generated draft genomes and 101 publicly available genomes from Portugal demonstrated that the genomic relatedness of S. aureus from wild ungulates assigned to livestock-associated sequence types (ST) is greater compared to wild ungulate isolates assigned to human-associated STs. Screening of host specificity determinants disclosed the unexpected presence in wildlife of the immune evasion cluster encoded in φSa3 prophage, described as a human-specific virulence determinant. Additionally, two plasmids, pAVX and pETB, previously associated with avian species and humans, respectively, and the Tn553 transposon were detected. Both pETB and Tn553 encode penicillin resistance through blaZ. Pangenome analysis of wild ungulate isolates shows a core genome fraction of 2133 genes, with isolates assigned to ST72 and ST3224 being distinguished from the remaining by MGEs, although there is no reported role of these in adaptation to wildlife. AMR related gene clusters found in the shell genome are directly linked to resistance against penicillin, macrolides, fosfomycin, and aminoglycosides, and they represent mobile ARGs. Altogether, our findings support epidemiological interactions of human and non-human hosts at interfaces, with MGE exchange, including AMR determinants, associated with putative indirect movements of S. aureus among human and wildlife hosts that might be bridged by livestock.

RevDate: 2024-05-31

de Almeida OGG, Bertozzi BG, de Oliveira Rocha L, et al (2024)

Genomic-wide analysis of Salmonella enterica strains isolated from peanuts in Brazil.

International journal of food microbiology, 420:110767 pii:S0168-1605(24)00211-3 [Epub ahead of print].

Peanut-based products have been associated with Salmonella foodborne outbreaks and/or recalls worldwide. The ability of Salmonella to persist for a long time in a low moisture environment can contribute to this kind of contamination. The objective of this study was to analyse the genome of five S. enterica enterica strains isolated from the peanut supply chain in Brazil, as well as to identify genetic determinants for survival under desiccation and validate these findings by phenotypic test of desiccation stress. The strains were in silico serotyped using the platform SeqSero2 as Miami (M2851), Javiana (M2973), Oranienburg (M2976), Muenster (M624), and Glostrup/Chomedey (M7864); with phylogenomic analysis support. Based on Multilocus Sequence Typing (MLST) the strains were assigned to STs 140, 1674, 321, 174, and 2519. In addition, eight pathogenicity islands were found in all the genomes using the SPIFinder 2.0 (SPI-1, SPI-2, SPI-3, SPI-5, SPI-9, SPI-13, SPI-14). The absence of a SPI-4 may indicate a loss of this island in the surveyed genomes. For the pangenomic analysis, 49 S. enterica genomes were input into the Roary pipeline. The majority of the stress related genes were considered as soft-core genes and were located on the chromosome. A desiccation stress phenotypic test was performed in trypticase soy broth (TSB) with four different water activity (aw) values. M2976 and M7864, both isolated from the peanut samples with the lowest aw, showed the highest OD570nm in TSB aw 0.964 and were statistically different (p < 0.05) from the strain isolated from the peanut sample with the highest aw (0.997). In conclusion, genome analyses have revealed signatures of desiccation adaptation in Salmonella strains, but phenotypic analyses suggested the environment influences the adaptive ability of Salmonella to overcome desiccation stress.

RevDate: 2024-05-31
CmpDate: 2024-05-31

Carhuaricra-Huaman D, JC Setubal (2024)

Step-by-Step Bacterial Genome Comparison.

Methods in molecular biology (Clifton, N.J.), 2802:107-134.

Thanks to advancements in genome sequencing and bioinformatics, thousands of bacterial genome sequences are available in public databases. This presents an opportunity to study bacterial diversity in unprecedented detail. This chapter describes a complete bioinformatics workflow for comparative genomics of bacterial genomes, including genome annotation, pangenome reconstruction and visualization, phylogenetic analysis, and identification of sequences of interest such as antimicrobial-resistance genes, virulence factors, and phage sequences. The workflow uses state-of-the-art, open-source tools. The workflow is presented by means of a comparative analysis of Salmonella enterica serovar Typhimurium genomes. The workflow is based on Linux commands and scripts, and result visualization relies on the R environment. The chapter provides a step-by-step protocol that researchers with basic expertise in bioinformatics can easily follow to conduct investigations on their own genome datasets.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

In the mid-1970s, scientists began using DNA sequences to reexamine the history of all life. Perhaps the most startling discovery to come out of this new field—the study of life’s diversity and relatedness at the molecular level—is horizontal gene transfer (HGT), or the movement of genes across species lines. It turns out that HGT has been widespread and important; we now know that roughly eight percent of the human genome arrived sideways by viral infection—a type of HGT. In The Tangled Tree, “the grandest tale in biology….David Quammen presents the science—and the scientists involved—with patience, candor, and flair” (Nature). We learn about the major players, such as Carl Woese, the most important little-known biologist of the twentieth century; Lynn Margulis, the notorious maverick whose wild ideas about “mosaic” creatures proved to be true; and Tsutomu Wantanabe, who discovered that the scourge of antibiotic-resistant bacteria is a direct result of horizontal gene transfer, bringing the deep study of genome histories to bear on a global crisis in public health.

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )