picture
RJR-logo

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

icon

Bibliography Options Menu

icon
QUERY RUN:
25 Sep 2025 at 01:33
HITS:
4593
PAGE OPTIONS:
Hide Abstracts   |   Hide Additional Links
NOTE:
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

RJR-3x

Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 25 Sep 2025 at 01:33 Created: 

Pangenome

Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: ( pangenome OR "pan-genome" OR "pan genome" ) NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

-->

RevDate: 2025-09-24
CmpDate: 2025-09-24

Lee GY, Cho HY, Sayem SAJ, et al (2025)

Complete genome sequence and genomic characterization of the probiotic Limosilactobacillus reuteri PSC102.

Open veterinary journal, 15(6):2427-2438.

BACKGROUND: Gut microbiota are potential sources of probiotics and play an essential role in maintaining intestinal health. Limosilactobacillus reuteri PSC102 (L. reuteri PSC102), which was isolated from the feces of healthy pigs, exhibited health-beneficial properties.

AIM: We aimed to conduct a whole-genome sequencing analysis of L. reuteri PSC102 to determine its molecular characteristics as a probiotic strain.

METHODS: Limosilactobacillus reuteri PSC102 cells were cultured in De Man-Rogosa-Sharpe medium, followed by DNA extraction for genomic analysis using the PacBio-Illumina sequencing platform. The EzBioCloud software was used to perform gene assembly, and the genes were interpreted by the National Center for Biotechnology Information (NCBI) and the Glimmer program. Core and pan-genomic analyses were performed to assess the extent of functional conservation in the genomic sequence. Moreover, the NCBI database and the Basic Local Alignment Search Tool software were used to identify antimicrobial resistance genes and virulence factors.

RESULTS: Limosilactobacillus reuteri PSC102 consists of a single circular chromosome with 2,048,626 bp, a guanine- cytosine of 38.9%, 18 rRNA genes, and 69 tRNA genes. Among the 1,846 protein-coding sequences, genes associated with probiotic characteristics were identified, including genes involved in host-microbe interactions, stress tolerance, biogenesis, and defense mechanisms. Furthermore, the genome of L. reuteri PSC102 comprises 2,446 pan-genome and 1,222 core-genome orthologous gene clusters. A total of 74 unique genes were identified in L. reuteri PSC102 genome. These genes mostly encode proteins potentially involved in the transport and metabolism of amino acids and carbohydrates. Moreover, antibacterial resistance genes and virulence factors were absent in L. reuteri PSC102.

CONCLUSION: The results of the molecular insight into L. reuteri PSC102 corroborates its use as a probiotic in humans and other animals.

RevDate: 2025-09-23
CmpDate: 2025-09-23

Basu U, SK Parida (2025)

Next-generation translational genomics for developing future crops.

Functional & integrative genomics, 25(1):196.

Advancements in translational genomics have revolutionized crop breeding, driving us from traditional breeding methods towards next-generation strategies that integrate genomic, transcriptomic, and phenotypic data to expedite crop improvement. There has been a shift from single genomes to pan-genomes, which better capture intraspecific diversity, and from bulk transcriptome analyses to single-cell transcriptomics, enabling cell-specific insights into gene regulation and functional genomics. Both high throughput genopyting and phenotyping approaches are now possible due to rapid technological advancement in the field of translational genomics. Large-scale phenotyping data from multi-environment field trials is now possible due to AI-enabled digital and drone-based scanning. In the era of artificial intelligence and machine learning we have developed flexible models to handle complex genetic architecture of trait regulation using various tools and approaches. These genetic and genomic resources are the foundation for generating novel, adaptable, and high-yielding varieties, accelerating trait discovery and mapping. This review explores the comprehensive landscape of modern translational genomics, highlighting key shifts and innovations that enhance our capacity to address agricultural challenges. Integrative pipelines that unify these next-generation approaches could facilitate faster, more precise, and sustainable crop improvement, ultimately meeting the growing demands for future-ready crops.

RevDate: 2025-09-23
CmpDate: 2025-09-23

Nguyen CTK, Lam PGH, Dang NTT, et al (2025)

Phenotypic, secondary metabolite, and genomic characterization of Bacillus siamensis B03 from brackish water with Anti‑Vibrio potential.

Archives of microbiology, 207(11):277.

This study aimed to provide the first genomic, functional, and metabolic characterization of Bacillus siamensis B03 from Lang Co Bay, Vietnam, and to evaluate the hypothesis that it harbors unique metabolic traits and antimicrobial potential against Vibrio spp.. B03 showed no hemolytic activity, moderate biofilm formation, tolerance to 10% NaCl, and broad extracellular enzyme secretion. Ethyl acetate extracts from the cell-free supernatant exhibited in vitro antibacterial activity against five Vibrio species, with inhibition zones of 14.7-22.3 mm. Metabolomic analysis tentatively identified 20 compounds, mainly cyclic dipeptides, some previously reported to possess antimicrobial properties. Draft, plasmid-free genome (3,745,205 bp; 46.4% GC) encodes 3,561 proteins and 72 tRNAs, with genes mainly for amino acid and carbohydrate metabolism, and contains gene clusters for bacillaene, fengycin, difficidin, bacillibactin (100% similarity), and surfactin (78%). Pan-genome analysis of 24 B. siamensis genomes revealed 2,254 core genes, with B03 contributing 31 unique genes, including those encoding NAD-dependent malic enzyme and adenylation domain proteins. These findings suggest B. siamensis B03 as a potential control agent against Vibrio pathogens.

RevDate: 2025-09-23

Recuerda M, Kraemer S, Rosoni JRR, et al (2025)

A pangenomic approach reveals the sources of genetic variation fueling the rapid radiation of Capuchino Seedeaters.

Evolution; international journal of organic evolution pii:8262272 [Epub ahead of print].

The search for the genetic basis of phenotypes has primarily focused on single nucleotide polymorphisms, often overlooking structural variants (SVs). SVs can significantly affect gene function, but detecting and characterizing them is challenging, even with long-read sequencing. Moreover, traditional single-reference methods can fail to capture many genetic variants. Using long-reads, we generated a Capuchino Seedeater (Sporophila) pangenome, including 16 individuals from seven species, to investigate how SVs contribute to species and coloration differences. Leveraging this pangenome, we mapped short-read data from 127 individuals, genotyped variants identified in the pangenome graph and subsequently perform FST scans and genome-wide association studies. Species divergence primarily arises from SNPs and indels (< 50 bp) in non-coding regions of melanin-related genes, as larger SVs rarely overlap with divergence peaks. One exception was a 55 bp deletion near the OCA2 and HERC2 genes, associated with feather pheomelanin content. These findings support the hypothesis that the reshuffling of small regulatory alleles, rather than larger species-specific mutations, accelerated plumage evolution leading to prezygotic isolation in Capuchinos.

RevDate: 2025-09-23

Eladawy M, Heslop N, Negus D, et al (2025)

Phenotype-genotype discordance in antimicrobial resistance profiles of Gram-negative uropathogens recovered from catheter-associated urinary tract infections in Egypt.

The Journal of antimicrobial chemotherapy pii:8262031 [Epub ahead of print].

OBJECTIVES: Catheter-associated urinary tract infections (CAUTIs) are among the most common healthcare-associated infections in low- and middle-income countries (LMICs), but there are few resistome data available for relevant uropathogens. The goal of this study was to characterize the antimicrobial resistance (AMR) phenotypes and genotypes of a large collection of Gram-negative bacteria recovered from CAUTIs in a hospital in Mansoura, Egypt.

METHODS: Phenotypic AMR profiles and whole-genome sequence data were generated for 132 isolates. Resistomes were predicted using ResFinder, CARD and AMRFinder. Similarity of uropathogen genomic data was determined using sourmash (kmer signatures). Escherichia coli genomic data were subject to a pangenome analysis using Panaroo.

RESULTS: Sixty-seven E. coli (Phylogroup B2; 53.7%, 36/67), 14 Pseudomonas aeruginosa, 11 Klebsiella pneumoniae, 9 Proteus mirabilis, 8 Providencia spp., 5 Enterobacter hormaechei and 18 rare CAUTI-associated isolates were identified. Several (22/132) isolates were multidrug-resistant, while almost half (62/132) were extensively drug-resistant. Phenotype-genotype discordance was found to be an important consideration in resistome studies in Egypt, with a total concordance of 91% (1115/1225), 85.7% (1273/1485) and 80.5% (1196/1485) for ResFinder, CARD and AMRFinder, respectively. Pseudomonas, at the species level, exhibited the greatest discordance. At the antimicrobial level, meropenem was subject to greatest discordance. New AMR variants were found for Egypt for Pseudomonas (blaOXA-486, blaOXA-488, blaOXA-905, blaIMP-43, blaPDC-35, blaPDC-45, blaPDC-201) and E. coli (blaTEM-176, blaTEM-190).

CONCLUSIONS: This study shows that there is phenotype-genotype discordance in AMR profiling among CAUTI isolates, highlighting the need for comprehensive approaches in resistome studies. We also show the genomic diversity of Gram-negative uropathogens contributing to disease burden in a little-studied LMIC setting.

RevDate: 2025-09-22

Dong S, Wang S, Li L, et al (2025)

Bryophytes hold a larger gene family space than vascular plants.

Nature genetics [Epub ahead of print].

After 500 million years of evolution, extant land plants compose the following two sister groups: the bryophytes and the vascular plants. Despite their small size and simple structure, bryophytes thrive in a wide variety of habitats, including extreme conditions. However, the genetic basis for their ecological adaptability and long-term survival is not well understood. A comprehensive super-pangenome analysis, incorporating 123 newly sequenced bryophyte genomes, reveals that bryophytes possess a substantially greater diversity of gene families than vascular plants. This includes a higher number of unique and lineage-specific gene families, originating from extensive new gene formation and continuous horizontal transfer of microbial genes over their long evolutionary history. The evolution of bryophytes' rich and diverse genetic toolkit, which includes new physiological innovations like unique immune receptors, likely facilitated their spread across different biomes. These newly sequenced bryophyte genomes offer a valuable resource for exploring alternative evolutionary strategies for terrestrial success.

RevDate: 2025-09-22
CmpDate: 2025-09-22

Suresh R, Jayachandiran S, Balu P, et al (2025)

Comparative genome analysis of human pathogen Parvimonas micra revealed strain JM503A as potential novel species in the genus Parvimonas and high intra-species functional diversity.

Microbial genomics, 11(9):.

Parvimonas micra is a Gram-positive, anaerobic bacterium commonly found in the oral cavity, skin and gastrointestinal tract. While typically a harmless organism, it can cause infections in individuals with weakened immune systems, leading to conditions like periodontitis and deep-tissue abscesses. This study focuses on the comparative genomic analysis of P. micra to explore its evolutionary relationships, antimicrobial resistance profiles and functional diversity by assessing phylogenetic analyses, resistance genes, virulence factors, mobile genetic elements, carbohydrate-active enzymes and pan-genome analysis. Comparative genomic analysis of 11 P. micra strains reveals significant functional variations among the strains, indicating notable interspecies diversity. Phylogenetic and comparative genome analysis revealed that strain JM503A is taxonomically distinct from the P. micra species, with genome similarity ranging from 54% to 61%. The 16S rRNA sequence similarity of strain JM503A is 98.28%, indicating a distinct phylogenetic position. The average nucleotide identity value ranging from 91.32% to 91.7% and digital DNA-DNA hybridization values ranging from 43.00% to 44.00% of JM503A with other strains are below the cutoff values <95% and <70%, respectively, which confirms JM503A as a novel species. Based on its evolutionary relationships, strain JM503A is identified as a potential new species of Parvimonas, providing important evidence for its reclassification as a new species within the genus Parvimonas.

RevDate: 2025-09-22

Fouts DE, Clarke TH, Severin GB, et al (2025)

Integrating genomic and Tn-Seq data to identify common in vivo fitness mechanisms across multiple bacterial species.

mBio [Epub ahead of print].

UNLABELLED: Sepsis, a life-threatening organ dysfunction, is due to an unregulated immune response to infection. Bacteremia is a leading cause of sepsis, and members of the Enterobacterales cause nearly half of bacteremia cases annually. Although previous Tn-Seq studies identified novel bacteremia-fitness genes, evidence for common pathways across species is lacking. To identify common fitness pathways in five bacteremia-causing Enterobacterales species, we utilized our pan-genome pipeline to integrate Tn-Seq fitness data with multiple available functional data types. Core genes from species pan-genomes were used to construct a multi-species core pan-genome, producing 2,850 core gene clusters found in four of five species. Integration of Tn-Seq fitness data identified 373 protein clusters conserved in all five species and a fitness gene in at least one of them. A scoring rubric was applied to these clusters, which incorporated Tn-Seq fitness defects, operon localization, and antibiotic susceptibility data, which reduced the number of bacteremia-fitness genes and identified seven common fitness mechanisms. Independent mutational validation of one prioritized fitness gene, tatC, showed reduced fitness in vivo across all species tested and increased susceptibility to β-lactams that was restored following tatC complementation in trans. By integrating known operon structures and antibiotic susceptibility with Tn-Seq fitness data, common genes within the core pan-genome emerged and revealed mechanisms essential for survival in the mammalian bloodstream. Our prediction and validation of tatC as a common bacteremia fitness factor supports the utility of this bioinformatic approach. This study represents a major step forward in prioritizing novel targets for therapy against sepsis infections.

IMPORTANCE: Bacteremia is a leading cause of sepsis, a life-threatening condition where an unregulated immune response to infection causes systemic organ failure. Nearly half of bacteremia cases are caused by members of the Gram-negative bacterial taxonomic order Enterobacterales. Given the public health impact of bacteremia and the reduction of existing antibiotic treatment options, novel strategies are needed to combat these infections. In this study, pan-genome software was used to predict seven shared fitness pathways in these bacteria that may serve as novel targets for the treatment of bacteremia. Briefly, a scoring rubric was applied to shared pan-genome clusters, which incorporated multiple data types, including Tn-Seq fitness defects, operon localization, and antibiotic susceptibility data to rank and prioritize fitness genes. To validate one of our predictions, mutations were constructed in tatC, which showed both reduced fitness in mice in all species tested and increased susceptibility to β-lactam antibiotics; complementation restored fitness and antibiotic susceptibility to wild-type levels. This study takes a novel bioinformatics approach to build a core pan-genome across multiple distantly related bacteria to integrate computational and experimental data to predict important shared fitness genes and represents a major step forward toward identifying novel targets of therapy against these deadly, widespread, life-threatening infections.

RevDate: 2025-09-22
CmpDate: 2025-09-22

Roberts MD, EB Josephs (2025)

k-mer-based diversity scales with population size proxies more than nucleotide diversity in a meta-analysis of 98 plant species.

Evolution letters, 9(4):434-445.

A key prediction of neutral theory is that the level of genetic diversity in a population should scale with population size. However, as was noted by Richard Lewontin in 1974 and reaffirmed by later studies, the slope of the population size-diversity relationship in nature is much weaker than expected under neutral theory. We hypothesize that one contributor to this paradox is that current methods relying on single nucleotide polymorphisms (SNPs) called from aligning short reads to a reference genome underestimate levels of genetic diversity in many species. As a first step to testing this idea, we calculated nucleotide diversity (π) and k -mer-based metrics of genetic diversity across 112 plant species, amounting to over 205 terabases of DNA sequencing data from 27,488 individuals. After excluding 14 species with low coverage or no variant sites called, we compared how different diversity metrics correlated with proxies of population size that account for both range size and population density variation across species. We found that our population size proxies scaled anywhere from about 3 to over 20 times faster with k -mer diversity than nucleotide diversity after adjusting for evolutionary history, mating system, life cycle habit, cultivation status, and invasiveness. The relationship between k -mer diversity and population size proxies also remains significant after correcting for genome size, whereas the analogous relationship for nucleotide diversity does not. These results are consistent with the possibility that variation not captured by common SNP-based analyses explains part of Lewontin's paradox in plants, but larger scale pangenomic studies are needed to definitively address this question.

RevDate: 2025-09-21

Lu HB, Kong LY, Chen L, et al (2025)

Isolation of diverse Undibacterium-related strains from alpine lakes and re-examining the taxonomic status of this genus.

Systematic and applied microbiology, 48(6):126661 pii:S0723-2020(25)00083-9 [Epub ahead of print].

The genus Undibacterium is an important member of Oxalobacteraceae and most species of this genus were isolated from freshwater environments. The recent study based on the genomic analyses revised the taxonomic status of 23 Undibacterium species and proposed that these species should be assigned into four genera (Undibacterium, Neoundibacterium, Affinundibacterium and Paraundibacterium), respectively. During the investigation of microbial resources inhabited in alpine lakes from the southwestern China in 2023, 25 strains show the highest 16S rRNA gene sequence similarities with Undibacterium species were isolated. Utilizing the genomes of these 25 strains and 26 Undibacterium species, the phylogenies among these strains are reconstructed based on the core and pan-genome, respectively. The phylogenomic trees show that the 26 Undibacterium species should be divided into six clades and each clade should represent an independent genus. As the clades 2, 3, 4 and 5 proposed in this study have been revised in other study, the genera Cognatundibacterium and Pseudundibacterium are proposed in this study to accommodate the clades 1 and 6, respectively. The detailed genomic annotations reveal that all the 25 isolated Undibacterium-related strains harbor complete amino acids metabolisms and genes encoding DNA replication and repair, homologous recombination proteins, two-component and phosphate transport systems in response to the oligotrophic, high UV radiation and phosphorus-limited environments of alpine lakes. This study clarifies the role of Undibacterium-related strains in alpine lakes and demonstrates that isolating more strains is of great benefit to the bacterial taxonomy.

RevDate: 2025-09-19

Wang C, Wang T, Zhang T, et al (2025)

Domestication-Selected Promoter Insertion in WRKY17 Increases Cadmium Sensitivity in Apple.

Plant biotechnology journal [Epub ahead of print].

With increasing industrialisation and human activities, heavy metal pollution has become a serious environmental concern, particularly cadmium (Cd) contamination. This study reveals significant differences in Cd tolerance between wild apple (Malus spp.) and cultivated apple (Malus domestica). Through pan-genome analysis, we identified the transcription factor WRKY17 as a key regulator of Cd stress response, with a 3355-bp insertion (P-INS) in its promoter region being the primary genetic basis for this differential tolerance. In cultivated apples, P-INS suppresses WRKY17 expression, leading to reduced Cd tolerance. In contrast, wild apples lacking P-INS exhibit activated WRKY17 expression. Further investigation demonstrated that WRKY17 enhances Cd tolerance by inducing the expression of long non-coding RNA lncRNA400. Mechanistically, lncRNA400 forms an R-loop structure that recruits the histone demethylase JMJD5 to remove H3K27me3 marks from the promoter of the Plant Cadmium Resistance gene PCR2, thereby activating PCR2 expression. Notably, WRKY17 activation also accelerates leaf senescence, explaining why P-INS was retained during apple domestication-its suppression of WRKY17 maintains better agronomic traits despite reduced Cd tolerance. In apple cultivation, grafting wild apple rootstocks with cultivated scions effectively combines the Cd-tolerant traits of wild varieties with the delayed leaf senescence characteristics of cultivated cultivars, providing a practical solution for the apple industry to address Cd contamination.

RevDate: 2025-09-18
CmpDate: 2025-09-18

Sun X, Xu P, Shi Y, et al (2025)

Drug selection based on pan-genomics genetic features of Mycobacterium tuberculosis.

Frontiers in microbiology, 16:1663069.

Tuberculosis, caused by Mycobacterium tuberculosis, is a severe and persistent global public health issue, particularly exacerbated by the emergence of multidrug-resistant and extensively drug-resistant strains. This study employed pan-genomic approaches to analyze different strains with various resistance profiles, examining the diversity of bacterial genetic evolution in relation to mutations in resistance-related genes. The findings indicate that resistance-related genes are mostly core genes (94%), with a preference for base mutations closely associated with nonsynonymous mutations at resistance sites. Interestingly, while the majority of drugs induce positive selection in target genes, the tlyA gene under the influence of amikacin (AMI) undergoes passive selection. Cluster analysis of target genes suggests consistency between SNP clusters and drug-resistant clusters, revealing a strong correlation between bacterial evolutionary branches and resistance profiles. Consequently, based on pan-genome evolutionary characteristics, we identified the drug-resistant mutation pattern (DRMP) that can serve as a molecular fingerprint and indicator for drug sensitivity, aiding in the assessment and guidance of drug selection for treating different strains and the formulation of individualized treatment plans. This research not only enhances our understanding of the mechanisms of drug resistance in M. tuberculosis but also offers new perspectives for the development of new drugs, which is crucial for global tuberculosis control.

RevDate: 2025-09-17
CmpDate: 2025-09-17

Nguyen CTK, HD Nguyen (2025)

Genomic and Functional Characterization of Bacillus siamensis B01 with Antimicrobial Activity Against Vibrio spp.

Current microbiology, 82(11):509.

This study reports the first investigation of the biological, genomic, and anti-Vibrio properties of Bacillus siamensis B01 (= VTCC 910229) isolated from a mangrove forest in Vietnam. Comparative genomic analysis revealed a significant similarity between the genomes of strain B01 and B. siamensis KCTC 13613[T], as indicated by average nucleotide identity and digital DNA-DNA hybridization values of 98.8% and 97.5%, respectively. The draft genome comprised 3,823,156 bp, showed a GC composition of 46.2%, and encompassed 112 tRNAs and 3,738 coding sequences (CDSs), with key functional groups related to biological processes, amino acid transport, and metabolism. Within the genome, an 80,406 bp plasmid with a GC content of 38.5% was identified, carrying 105 CDSs, one tRNA, and seven mobile genetic elements without virulence and antibiotic resistance genes. Gene clusters for antibacterial compounds, including bacillaene, fengycin, difficidin, and bacillibactin, showed 100% similarity to known biosynthetic pathways. Pan-genome analysis revealed that strain B01 shares 2,433 core genes with 23 other B. siamensis genomes, whereas this isolate possesses 160 unique genes, accounting for 4.28% of its genome. The isolate grew optimally in neutral environments, tolerated alkaline pH and 10% NaCl, formed biofilms, showed no hemolysis, and secreted proteases, lipases, cellulases, and amylases. The ethyl acetate extract demonstrated antibacterial activity against Vibrio azureus, Vibrio sinaloensis, Vibrio neocaledonicus, Vibrio alginolyticus, and Vibrio parahaemolyticus with inhibition zones measuring 15.7-24.3 mm in diameter. These results underscore the promise of this Bacillus strain as a probiotic against pathogenic Vibrio in aquaculture.

RevDate: 2025-09-17
CmpDate: 2025-09-17

Wang T, Zheng Y, Sun L, et al (2025)

Analysis of maize PAL pan gene family and expression pattern under lepidopteran insect stress.

Frontiers in plant science, 16:1651563.

INTRODUCTION: Phenylalanine ammonia-lyase (PAL), as the rate-limiting enzyme in plant phenylpropanoid metabolism, catalyzes the conversion of L-phenylalanine to trans-cinnamic acid and plays a pivotal role in plant-insect resistance mechanisms.

METHODS: Utilizing a maize pangenome constructed from 26 high-quality genomes, we systematically identified the ZmPAL gene family members. Evolutionary pressure and structural variation (SV) analyses were conducted, alongside reanalysis of publicly available RNA-seq datasets under lepidopteran stress conditions. Temporal expression patterns were further validated via qRT-PCR.

RESULTS: This investigation identified 29 ZmPAL genes, comprising 7 core, 2 near-core, 12 dispensable, and 8 private genes, revealing substantial limitations of single-reference genome-based studies. Evolutionary analysis indicated positive selection of ZmPAL8 in specific germplasms, while SV-affected ZmPAL5 exhibited significantly divergent expression patterns. Conserved expression profiles were observed among ZmPAL members under diverse lepidopteran stresses. Temporal-specific regulation was established: ZmPAL7, ZmPAL10, and ZmPAL23 dominated early defense responses, whereas ZmPAL10 and ZmPAL23 maintained predominance during mid-late phases.

DISCUSSION: This pangenome-based study provides novel insights into PAL-mediated phytoprotective mechanisms against lepidopteran pests and establishes a theoretical framework for understanding maize's molecular adaptation to biotic stressors.

RevDate: 2025-09-17
CmpDate: 2025-09-17

Mudipalli Elavarasu S, S K (2025)

Rational design of an epitope-centric vaccine against Pseudomonas aeruginosa using pangenomic insights and immunoinformatics approach.

Frontiers in immunology, 16:1617251.

INTRODUCTION: As a highly adaptable opportunistic pathogen, Pseudomonas aeruginosa presents a significant threat to people with weakened immune systems. This is because it naturally resists antibiotics and can form biofilms. These factors complicate treatment and underscore the urgent need for innovative therapeutic strategies, such as vaccines, to combat this pathogen.

METHODS: A pangenome analysis of P. aeruginosa genomes was performed to identify conserved core genes critical for bacterial survival and virulence. LptF, an outer membrane protein, was prioritized as a target for vaccine development. B-cell and T-cell epitopes were predicted from LptF using immunoinformatics tools, and a multi-epitope peptide vaccine was designed. The interaction between the vaccine candidate and Toll-like receptors (TLRs) was investigated through molecular docking and molecular dynamics simulations. Codon optimization and in-silico cloning were carried out to validate the vaccine's expression potential in E. coli. Immune response simulations evaluated the vaccine's immunogenicity.

RESULTS: Our pangenome analysis identified highly conserved core genes, including LptF, which proved crucial for bacterial virulence. A multi-epitope peptide vaccine was designed using the most immunogenic B-cell and T-cell epitopes derived from LptF. Studies using molecular docking and dynamic simulation have shown stable interactions between the vaccine and TLRs, with the POA_V_RS09 construct exhibiting the highest stability. Codon optimization indicated high expression efficiency in E. coli. Immune simulations revealed robust adaptive immune responses, including sustained IgG production, the formation of memory B cells, and the activation of T-cell responses.

DISCUSSION: The POA_V_RS09 vaccine candidate exhibited excellent stability, immunogenic potential, and expression efficiency, making it a promising candidate for combating P. aeruginosa infections. This study provides a strong foundation for developing effective therapeutic strategies to address the growing issue of antimicrobial resistance in P. aeruginosa. More experimental validation is needed to verify its effectiveness in preclinical and clinical environments.

RevDate: 2025-09-15
CmpDate: 2025-09-15

Fu X, Bi X, Lv T, et al (2025)

Characterization of High-Linezolid-MIC Clostridioides difficile Isolated from a Chinese Hospital: First Genomic Evidence of Cfr(B) Transmission and Tn6218 Association.

Infection and drug resistance, 18:4789-4798.

BACKGROUND: Clostridioides difficile (C. difficile) exhibiting high linezolid minimum inhibitory concentration (>4 µg/mL) remains infrequently reported in clinical settings. Notably, the prevalence of linezolid-resistant C. difficile is exceptionally low (<3% in Chinese isolates), and the underlying genetic determinants are poorly characterized.

METHODS: We conducted a genomic study to investigate the genetic characteristics of C. difficile with high linezolid MIC. To determine the MIC of linezolid and delineate antimicrobial resistance profiles, these isolates were systematically subjected to antimicrobial susceptibility testing. Multilocus sequence typing, antimicrobial resistance genes, and the characteristics of the cfr gene in linezolid-resistant C. difficile strains were analyzed following whole-genome sequencing. Roary was used to construct a pangenome phylogenetic tree, and a Bayesian evolutionary analysis was performed using BEAST.4.

RESULTS: Among 421 screened C. difficile isolates, nine isolates (2.1%) exhibited high-linezolid MICs (≥16 μg/mL), including six ST37 (A-B+) and three ST3 strains (two A-B-). All harbored cfr(B) on Tn6218, sharing homology with E. faecium (NG_050395.1).

CONCLUSION: This study underscores the risk of cfr(B) dissemination via mobile genetic elements in clinical settings, urging surveillance of co-occurrence in Enterococcus and C. difficile to curb resistance spread.

RevDate: 2025-09-14

Tariq A, Ahmed A, Khatoon A, et al (2025)

Identification of drug targets in pan-drug resistant Acinetobacter baumannii via whole genome sequencing and subtractive genomics.

Computers in biology and medicine, 197(Pt B):111058 pii:S0010-4825(25)01410-6 [Epub ahead of print].

In this study, we report, to the best of our knowledge, the first complete genome sequence of a pan drug resistant (PDR) Acinetobacter baumannii strain (JRCGR-AK-AB-01) from Karachi, Pakistan. Strain JRCGR-AK-AB-01 exhibited a pan drug resistant phenotype, showing susceptibility only to polymyxin B and intermediate susceptibility to colistin. Hybrid genome sequencing using MinION long-reads and DNBSEQ short-reads revealed that the genome size of strain JRCGR-AK-AB-01 is 4.03 Mb. We identified that JRCGR-AK-AB-01 is closely related to other A. baumannii strains based on Average Amino Acid Identity (AAI), Genome-to-Genome Distance Calculator (GGDC), and Average Nucleotide Identity (ANI) analyses. Furthermore, pan-genome analysis revealed an open pan-genome, indicating frequent gene exchange. Subsequently, a subtractive genomics approach was employed to identify potential drug targets within the core genes that are essential, druggable, and non-homologous to both human proteins and gut microbiota. Finally, the selected genes were screened against the JRCGR-AK-AB-01 proteome to eliminate redundancies. Among these, NADP-dependent isocitrate dehydrogenase (IDH) was used for downstream analysis. Its structure was predicted via homology modeling and validated using different bioinformatics tools. Molecular docking and molecular dynamics (MD) simulations revealed that neomycin and paromomycin were the potent drugs against Acinetobacter spp. In vitro studies confirmed that neomycin (2.25 mg/mL) exhibited antimicrobial activity against the PDR strain of A. baumannii. Overall, this study defines genomic features and identifies potential therapeutic targets in PDR A. baumannii, thereby providing a foundation for future experimental validation and novel treatment strategies.

RevDate: 2025-09-13

Wang S, Zhong X, Cheng Y, et al (2025)

Pan-Genome Analysis of Cannabis sativa: Insights on Genomic Diversity, Evolution, and Environment Adaption.

International journal of molecular sciences, 26(17): pii:ijms26178354.

Cannabis sativa is a crop which has been cultivated since ancient times, with important cultural and industrial value. Due to its substantial economic impact, cannabis has attracted widespread scientific attention. A pan-genome is a significant tool for breeding, because it provides a comprehensive representation of genetic diversity. To provide a valuable tool for Cannabis breeding, we constructed a Cannabis pan-genome based on 113 accessions. A total of 24,679,380 bp of non-reference-genome sequences were assembled, identifying 1313 protein-coding genes. Using pan-genome analyses, a total of 32,428 gene presence/absence variations (PAVs) were obtained, and gene loss was recovered during the domestication of Cannabis. By partitioning the pan-genome using PAVs, a total of 23,309 core genes were identified, accounting for 71.88% of all genes in the pan-genome. In particular, there were 7148 flexible genes, making up 22.05% of the pan-genome. The flexible genes were associated with adaptive traits, including stress resistance and disease resistance in Cannabis. Population genetic analysis presented gene distribution, gene flow, and gene specificity on a pan-genome level. These results provide important genetic basis, functional genes, and guidance for Cannabis breeding.

RevDate: 2025-09-13

Vega-Muñoz MA, López-Hernández F, Cortés AJ, et al (2025)

Pangenomic and Phenotypic Characterization of Colombian Capsicum Germplasm Reveals the Genetic Basis of Fruit Quality Traits.

International journal of molecular sciences, 26(17): pii:ijms26178205.

Capsicum is one of the most economically significant vegetable crops worldwide, owing to its high content of bioactive compounds with nutritional, pharmacological, and industrial relevance. However, research has focused on C. annuum, often disregarding local diversity and secondary gene pools, which may contain hidden variation for quality traits. Therefore, this study evaluated the genetic and phenotypic diversity of 283 accessions from the Colombian germplasm collection in the agrobiodiversity hotspot of northwest South America, representing all five domesticated species of the genus. A total of 18 morphological, physicochemical, and biochemical fruit traits were assessed, including texture, color, capsaicinoid, and carotenoid content. The phenotypic data were integrated with genomic information obtained through genotyping-by-sequencing (GBS) using the C. annuum reference genome and a multispecies pangenome. Fixed-and-Random-Model-Circulating-Probability-Unification (FarmCPU) and Bayesian-information-and-Linkage-disequilibrium-Iteratively-Nested-Keyway (BLINK) genome-wide association studies (GWAS) were performed on both alignments, respectively, leading to the identification of complex polygenic architectures with 144 and 150 single nucleotide polymorphisms (SNPs) significantly associated with key fruit quality traits. Candidate genes involved in capsaicinoid biosynthesis were identified within associated genomic regions, terpenoid and sterol pathways, and cell wall modifiers. These findings highlight the potential of integrating pangenomic resources with multi-omics approaches to accelerate Capsicum improvement programs and facilitate the development of cultivars with enhanced quality traits and increased agro-industrial value.

RevDate: 2025-09-10

Minich JJ, Allsing N, Din MO, et al (2025)

Culture-independent meta-pangenomics enabled by long-read metagenomics reveals associations with pediatric undernutrition.

Cell pii:S0092-8674(25)00975-4 [Epub ahead of print].

The human gut microbiome is linked to child malnutrition, yet traditional microbiome approaches lack resolution. We hypothesized that complete metagenome-assembled genomes (cMAGs), recovered through long-read (LR) DNA sequencing, would enable pangenome and microbial genome-wide association study (GWAS) analyses to identify microbial genetic associations with child linear growth. LR methods produced 44-64× more cMAGs per gigabase pair (Gbp) than short-read methods, with PacBio (PB) yielding the most accurate and cost-effective assemblies. In a Malawian longitudinal pediatric cohort, we generated 986 cMAGs (839 circular) from 47 samples and applied this database to an expanded set of 210 samples. Machine learning identified species predictive of linear growth. Pangenome analyses revealed microbial genetic associations with linear growth, while genome instability correlated with declining length-for-age Z score (LAZ). This resource demonstrates the power of comparing cMAGs with health trajectories and establishes a new standard for microbiome association studies.

RevDate: 2025-09-09
CmpDate: 2025-09-09

Huang Y, Liu Y, Liu C, et al (2025)

Distinct evolutionary trajectories of subgenomic centromeres in polyploid wheat.

Genome biology, 26(1):271.

BACKGROUND: Centromeres are crucial for precise chromosome segregation and maintaining genome stability during cell division. However, their evolutionary dynamics, particularly in polyploid organisms with complex genomic architectures, remain largely enigmatic. Allopolyploid wheat, with its well-defined hierarchical ploidy series and recent polyploidization history, serves as an excellent model to explore centromere evolution.

RESULTS: In this study, we perform a systematic comparative analysis of centromeres in common wheat and its corresponding ancestral species, utilizing the latest comprehensive reference genome assembly available. Our findings reveal that wheat centromeres predominantly consist of five types of centromeric-specific retrotransposon elements (CRWs), with CRW1 and CRW2 being the most prevalent. We identify distinct evolutionary trajectories in the functional centromeres of each subgenome, characterized by variations in copy number, insertion age, and CRW composition. By utilizing CENH3-ChIP data across various ploidy levels, we uncover a series of CRW invasion events that have shaped the evolution of AA subgenome centromeres. Conversely, the evolutionary process of the DD subgenome centromeres involves their expansion from diploid to hexaploid wheat, facilitating adaptation to a larger genomic context. Integration of complete einkorn centromere assemblies and Aegilops tauschii pan-genomes further revealed subgenome-specific centromere evolutionary trajectories. By inclusion of synthetic hexaploid from S2-S3 generations, alongside 2x/6 × natural accessions, we demonstrate that DD subgenome centromere expansion represents a gradual evolutionary process rather than an immediate response to polyploidization.

CONCLUSIONS: Our study provides a comprehensive landscape of centromere adaptation, evolution, and maturation, along with insights into how retrotransposon invasions drive centromere evolution in polyploid wheat.

RevDate: 2025-09-09

Gaurav A, Singh H, Dige M, et al (2025)

Comparative Genome Analysis and Characterization of Lacticaseibacillus Paracasei NKN344 Strain Isolated from Curd of Buffalo Milk Reared on Brackish Water Lagoons of the Eastern Indian Coast.

Probiotics and antimicrobial proteins [Epub ahead of print].

Ethnic fermented foods represent a significant repository for discovering novel probiotic entities. These fermented foods, entrenched in indigenous practices, have conserved a distinct microbiota through generations. Exploration of these fermented foods could yield microbial consortia capable of transforming human health. However, comprehensive research into the probiotic attributes and quality analysis is necessary before its usage as biotherapeutics. In the current study, Chilika curd - an ethnic fermented curd originating from Odisha was explored to isolate novel probiotic strains. A detailed phenotypic and genomic characterization of a novel Lacticaseibacillus paracasei strain was conducted. Host-probiotic interactions were assessed using the Caenorhabditis elegans model. Lacticaseibacillus paracasei NKN344 exhibited robust survival under various physiochemical stresses, such as in vitro simulated gut environment and in vivo Caenorhabditis elegans intestinal model. Additionally, an in-depth bioinformatic analysis revealed the metabolic prowess of Lacticaseibacillus paracasei NKN344, including a few bacteriocin-encoding operons. Lastly, the production of active bacteriocin by Lacticaseibacillus paracasei NKN344 was validated, showing inhibitory activity against Bacillus cereus, a major food spoilage bacterium. Results of the current study proved that Lacticaseibacillus paracasei NKN344 isolated from Chilika curd has promising probiotic properties and seems favorable for its use in functional fermented foods.

RevDate: 2025-09-09

Thallman RM, Borgert JE, Engle BN, et al (2025)

A Vision of How Low-Coverage Sequence Data Should Contribute to Genetic Evaluation in the Future.

Journal of animal science pii:8249815 [Epub ahead of print].

Low-coverage sequencing refers to sequencing DNA of individuals to a low depth of coverage (e.g., 0.5X) and imputing that sequence to genomic sequence based on reference haplotypes from individuals sequenced to high depth of coverage (e.g., ≥ 10X). It has been proposed as an alternative to genotyping by SNP arrays. At least one commercial product based on it is available for agricultural species. Concerns limiting adoption in its current form are: 1) the cost of storing the huge volume of data it generates and 2) whether that additional data will result in improved accuracy of genetic evaluation. This work envisions future implementation of low-coverage sequencing to reduce storage costs and enhance genetic evaluations by leveraging the additional information in the full sequence of the pangenome to account for more genetic variation. We propose addressing the storage issue by representing genomic sequence of an individual in a pair of haplotype arrays with each element pointing to an enumerated haplotype of the sequence within one of approximately 50,000 defined genome segments. Assuming 60 million genomic variants, the infrastructure required to translate the identifier of any enumerated haplotype into its genomic sequence would require less than 10 gigabytes of binary storage. Each haplotype array element would require 2 bytes, so the marginal binary storage required to represent the genomic sequence of an individual would be about 200 kilobytes (KB), similar to the genotypes from a SNP array with 200,000 markers. This assumes no pedigree and no ambiguity of the imputation, though the latter is unrealistic. Strategies to minimize, and when necessary, to manage and efficiently represent ambiguity are proposed. The genomic sequence of an individual could be stored in about 1 KB (binary) if both parents have unambiguous sequence stored as described above. The proposed system for representing the pangenome includes algorithms for read mapping and imputation intended to leverage all known genetic variation in the target population. It is also designed to use sequencing reads generated for imputing genomic sequence of new individuals to identify unrecognized mutations, crossovers, and structural variants, thus continuously improving the genome representation, especially if widespread use of low-coverage sequencing in livestock industries is realized. This could make improved genetic merit and management of livestock feasible without computational burden.

RevDate: 2025-09-08

Leite VLM, Faria AR, Guerra CF, et al (2025)

Hidden diversity in Enterococcus faecalis revealed by CRISPR2 screening: eco-evolutionary insights into a novel subspecies.

Microbiology spectrum [Epub ahead of print].

Enterococcus faecalis is a commensal bacterium that colonizes the gut of humans and animals and is a major opportunistic pathogen, known for causing multidrug-resistant healthcare-associated infections (HAIs). Its ability to thrive in diverse environments and disseminate antimicrobial resistance genes (ARGs) across ecological niches highlights the importance of understanding its ecological, evolutionary, and epidemiological dynamics. The CRISPR2 locus has been used as a valuable marker for assessing clonality and phylogenetic relationships in E. faecalis. In this study, we identified a group of E. faecalis strains lacking CRISPR2, forming a distinct, well-supported clade. We demonstrate that this clade meets the genomic criteria for classification as a novel subspecies, here referred to as "subspecies B." Through a comprehensive pangenome analysis and comparative genomics, we explored the adaptive ecological traits underlying this diversification process, identifying clade-specific features and their predicted functional roles. Our findings suggest that the frequent isolation of subspecies B from meat products and processing facilities may reflect dissemination routes involving environmental contamination (e.g., water, plants, soil) from avian species. The absence of key virulence traits required for pathogenicity in mammals, particularly humans, and the lack of clinically relevant resistance determinants indicate that subspecies B currently poses minimal threat to public health compared with the broadly disseminated "subspecies A." Nevertheless, the unclear potential for genetic exchange between these subspecies and the frequent association of subspecies B with food sources calls for continued genomic surveillance of E. faecalis from a One Health perspective to detect and mitigate the emergence of high-risk variants in advance.IMPORTANCEExploring intraspecific genetic variability in generalist bacteria with pathogenic potential, such as Enterococcus faecalis, is a key to uncovering stable evolutionary trends. By screening the CRISPR2 locus across a representative set of genomes from diverse sources, this study reveals a previously unrecognized lineage within the population structure of E. faecalis, associated with underexplored nonhuman and nonhospital reservoirs. These findings broaden our knowledge of the species' genetic landscape and shed light on its adaptive strategies and patterns of ecological dissemination. By bridging phylogenetic patterns with variation in genetic defense systems and accessory traits, the study generates testable hypotheses about the genomic determinants and corresponding selective pressures that shape the species' behavior and long-term dissemination. This work offers new perspectives on the eco-evolutionary dynamics of E. faecalis and highlights the value of genomic surveillance beyond clinical settings, in alignment with One Health principles.

RevDate: 2025-09-08
CmpDate: 2025-09-08

Khasapane NG, Nkhebenyane SJ, Thekisoe O, et al (2025)

Population structure, resistome, and virulome of Staphylococcus chromogenes strains from milk of subclinical bovine mastitis in South Africa.

Frontiers in cellular and infection microbiology, 15:1654546.

INTRODUCTION: Staphylococcus chromogenes are commonly found in intramammary infections associated with bovine subclinical mastitis in dairy cattle, yet their genomic diversity and antimicrobial resistance dynamics remain poorly characterized, particularly in African settings.

METHODS: This study presents a comparative genomic analysis of 17 S. chromogenes isolates from South Africa, including five newly sequenced bovine mastitis strains and twelve porcine-derived genomes retrieved from GenBank. In-silico analysis using multilocus sequence typing (MLST), virulence genes, antibiotic resistance genes and plasmids replicon types were used to characterise these isolates.

RESULTS AND DISCUSSION: Pairwise average nucleotide identity (ANI) analysis revealed that bovine isolates SC21, SC28, and SC33 are closely related and likely clonal members of the bovine-adapted ST138 lineage (ANI >99.7%), while SC12 and SC14 are more genetically distinct and show closer similarity (ANI >91%) to porcine-derived strains. This was supported by whole-genome SNP (wgSNP) analysis, whereby the ST138 bovine-derived isolates formed a clonal lineage and displayed a diverse population structure compared to porcine strains. Resistome profiling uncovered antimicrobial resistance gene (ARG) content, bovine isolates reflecting only four core ARGs i.e., dfrC, mgrA, norA, and tet(38), which confer resistance to trimethoprim, fluoroquinolones, and tetracyclines. In contrast, the compared porcine strains harboured a diverse set of resistance determinants, including blaZ, ermC, tet(K), and vgaALC that encode for beta-lactams, macrolides, tetracycline, and lincosamides, respectively. The five S. chromogenes isolates grouped into two 2 sequence types, namely ST138 and ST62. Pangenome reconstruction of 177 global genomes confirmed that S. chromogenes possesses an open pangenome, with only ~17.5% of genes conserved as core or soft-core elements. Notably, unique strain-specific genes of the ST138 were determined to be associated with trehalose metabolism identified in bovine isolates, potentially reflecting niche-specific adaptation to the mammary environment in the Free State Province of South Africa.

CONCLUSION: These findings advance our understanding of S. chromogenes population structure and resistance ecology. They underscore the importance of continued genomic surveillance of livestock pathogens to inform targeted intervention strategies and improve animal health in diverse production settings, and further clarify the implications for future antibiotic therapy and prevention of infections associated with this species.

RevDate: 2025-09-08
CmpDate: 2025-09-08

Kamaraj V, Gupta A, Raman K, et al (2025)

GViNC: an innovative framework for genome graph comparison reveals hidden patterns in the genetic diversity of human populations.

NAR genomics and bioinformatics, 7(3):lqaf121.

Genome graphs provide a powerful reference structure for representing genetic diversity. Their structure emphasizes the polymorphic regions in a collection of genomes, enabling network-based comparisons of population-level variation. However, current tools are limited in their ability to quantify and compare structural features across large genome graphs. We introduce GViNC, Genome graph Visualization, Navigation, and Comparison, a novel framework that enables partitioning genome graphs into interpretable subgraphs, mapping linear coordinates to graph nodes, and summarizing both local and global structural variation using new metrics for variability, hypervariability, and graph distances. We applied GViNC to multiple pan-genomic and population-specific genome graphs constructed with over 85M variants in 2504 individuals from the 1000 Genomes Project. We found that genomic complexity varied by ancestry and across chromosomes, with rare variants increasing variability by 10-fold and hypervariability by 50-fold. GViNC highlighted key regions of the human genome, such as Human Leukocyte Antigen and DEFB loci, and many previously unreported high-diversity regions, some with population-specific signatures in protein-coding and regulatory genes. By bridging sequence-level variation and graph-level topology, GViNC enables scalable, quantitative exploration of genome structure across populations. GViNC's versatility can aid researchers in extensively investigating the genetic diversity of different cohorts, populations, or species of interest.

RevDate: 2025-09-08
CmpDate: 2025-09-08

Ficoseco CMA, Chieffi D, Montemurro M, et al (2025)

Genomic Characterisation of Limosilactobacillus fermentum CRL2085 Unveiling Probiotic Traits for Application in Cattle Feed.

Environmental microbiology reports, 17(5):e70176.

Limosilactobacillus fermentum CRL2085, isolated from feedlot cattle rations, displayed high efficiency as a probiotic when administered to animals. A comprehensive genomic analysis was performed to elucidate the genetic basis underlying its probiotic potential. Fifteen genomic islands and CRISPR-Cas elements were identified in its genome. Pan-genomic analysis highlighted the dynamic evolution of this species, and clustering based on the nucleotide genomic similarity only partially correlated with the source of isolation or the geographic origin of the strains. Several genes known to confer probiotic properties were identified, including those related to adhesion, resistance to acidic pH and bile salts, tolerance to oxidative stress, metabolism/transport of sugars and other compounds, and genes for exopolysaccharide biosynthesis. In silico analysis of antimicrobial resistance genes and virulence determinants confirmed the safety of this strain. Moreover, genes related to B-group vitamins biosynthesis and feruloyl esterase hydrolase were also found, showing the nutritional contribution of the strain, which also showed moderate adhesion capability, exopolysaccharide production when grown with sucrose, and the capacity to metabolise 42 out of 95 carbon substrates tested. This data provides the genetic basis for deciphering the mechanisms beyond the benefits demonstrated by its use during cattle intensive raising and confirms its promising role as a probiotic.

RevDate: 2025-09-06

Soleymani F, Correa SM, Arend M, et al (2025)

Constraint-based metabolic modeling reveals metabolic properties underpinning the unprecedented growth of Chlorella ohadii.

The New phytologist [Epub ahead of print].

Comparative molecular and physiological analyses of organisms from one taxonomic group grown under similar conditions offer a strategy to identify gene targets for trait improvement. While this strategy can also be performed in silico using genome-scale metabolic models for the compared organisms, we continue to lack solutions for the de novo generation of such models, particularly for eukaryotes. To facilitate model-driven identification of gene targets for growth improvement in green algae, here we present a semiautomated platform for de novo generation of genome-scale algal metabolic models. We deployed this platform to reconstruct an enzyme-constrained, genome-scale metabolic model of Chlorella ohadii, the fastest growing green alga reported to date, and validated the growth predictions in experiments under three growth conditions. We also proposed a computational strategy to identify targets for growth improvement based on flux analyses. Extensive flux-based comparative analyses using all existing models of green algae resulted in the identification of potential targets for growth improvement not only in standard but also in extreme light conditions, where C. ohadii still exhibits exceptional growth. Our findings indicate that the developed platform provides the basis for the generation of pan-genome-scale metabolic models of algae.

RevDate: 2025-09-05

Chandola U, Manirakiza E, Maillard M, et al (2025)

A Bradyrhizobium isolate from a marine diatom induces nitrogen-fixing nodules in a terrestrial legume.

Nature microbiology [Epub ahead of print].

Biological nitrogen fixation converts atmospheric nitrogen into ammonia, essential to the global nitrogen cycle. While cyanobacterial diazotrophs are well characterized, recent studies have revealed a broad distribution of non-cyanobacterial diazotrophs (NCDs) in marine environments, although their study is limited by poor cultivability. Here we report a previously uncharacterized Bradyrhizobium isolated from the marine diatom Phaeodactylum tricornutum. Phylogenomic analysis places the strain within photosynthetic Bradyrhizobium, suggesting evolutionary adaptations to marine and terrestrial niches. Average nucleotide identity supports its classification as a previously undescribed species. Remarkably, inoculation experiments showed that the isolate induced nitrogen-fixing nodules in the Aeschynomene indica legume, pointing to symbiotic capabilities across ecological boundaries. Pangenome analysis and metabolic predictions indicate that this isolate shares more features with terrestrial photosynthetic Bradyrhizobium than with marine NCDs. Overall, these findings suggest that symbiotic interactions could evolve across different ecological niches, and raise questions about the evolution of nitrogen fixation and microbe-host interactions.

RevDate: 2025-09-05
CmpDate: 2025-09-05

Behruznia M, Marin M, Whiley DJ, et al (2025)

The Mycobacterium tuberculosis complex pangenome is small and shaped by sub-lineage-specific regions of difference.

eLife, 13:.

The Mycobacterium tuberculosis complex (MTBC) is a group of bacteria causing tuberculosis (TB) in humans and animals. Understanding MTBC genetic diversity is crucial for insights into its adaptation and traits related to survival, virulence, and antibiotic resistance. While it is known that within-MTBC diversity is characterised by large deletions found only in certain lineages (regions of difference [RDs]), a comprehensive pangenomic analysis incorporating both coding and non-coding regions remains unexplored. We utilised a curated dataset representing various MTBC genomes, including under-represented lineages, to quantify the full diversity of the MTBC pangenome. The MTBC was found to have a small, closed pangenome with distinct genomic features and RDs both between lineages (as previously known) and between sub-lineages. The accessory genome was identified to be a product of genome reduction, showing both divergent and convergent deletions. This variation has implications for traits like virulence, drug resistance, and metabolism. The study provides a comprehensive understanding of the MTBC pangenome, highlighting the importance of genome reduction in its evolution, and underlines the significance of genomic variations in determining the pathogenic traits of different MTBC lineages.

RevDate: 2025-09-05

Stepanauskas R, Brown JM, Arasti S, et al (2025)

Net rate of lateral gene transfer in marine prokaryoplankton.

The ISME journal pii:8248340 [Epub ahead of print].

Lateral gene transfer is a major evolutionary process in Bacteria and Archaea. Despite its importance, lateral gene transfer quantification in nature using traditional phylogenetic methods has been hampered by the rarity of most genes within the enormous microbial pangenomes. Here, we estimated lateral gene transfer rates within the epipelagic tropical and subtropical ocean using a global, randomized collection of single amplified genomes and a non-phylogenetic computational approach. By comparing the fraction of shared genes between pairs of genomes against a lateral gene transfer-free model, we show that an average cell line laterally acquires and retains ~13% of its genes every 1 million years. This translates to a net lateral gene transfer rate of ~250 genes L-1 seawater day-1 and involves both "flexible" and "core" genes. Our study indicates that whereas most genes are exchanged among closely related cells, the range of lateral gene transfer exceeds the contemporary definition of bacterial species, thus providing prokaryoplankton with extensive genetic resources for lateral gene transfer-based adaptation to environmental stressors. This offers an important starting point for the quantitative analysis of lateral gene transfer in natural settings and its incorporation into evolutionary and ecosystem studies and modeling.

RevDate: 2025-09-04

Long GS, Singh N, Patel S, et al (2025)

Integrated genomic approaches improve Treponema pallidum phylogenetics and lineage classification.

Canadian journal of microbiology [Epub ahead of print].

Syphilis cases have been consistently rising since its near elimination in the late 1990s. This resurgence, along with increasing rates of macrolide resistance and congenital syphilis, has triggered renewed efforts to better understand and control the disease. We analyzed 827 T. pallidum genomes and created a new genome-based hierarchical lineage framework, recapitulating the major T. pallidum lineages and characterizing sub-lineages. An updated pangenome was constructed, revealing that T. pallidum subsp. pallidum lineages are determined by a single hypothetical major outer sheath C-terminal domain-containing gene while no significant genetic difference was observed between T. pallidum subsp. pertenue and T. pallidum subsp. endemicum. This study introduces an integrated genomic approach to characterize T. pallidum and highlights the significance of pangenomes in supporting public health.

RevDate: 2025-09-04
CmpDate: 2025-09-04

Li H (2025)

Finding easy regions for short-read variant calling from pangenome data.

GigaScience, 14:.

BACKGROUND: While benchmarks on short-read variant calling suggest a low error rate below 0.5%, they are only applicable to predefined confident regions. For a human sample without such regions, the error rate could be 10 times higher. Although multiple sets of easy regions have been identified to alleviate the issue, they fail to consider nonreference samples or are biased toward existing short-read data or aligners.

RESULTS: Here, using hundreds of high-quality human assemblies, we derived a set of sample-agnostic easy regions where short-read variant calling reaches high accuracy. These regions cover 88.2% of GRCh38, 92.2% of coding regions, and 96.3% of ClinVar pathogenic variants. They achieve a good balance between coverage and easiness and can be generated for other human assemblies or species with multiple well-assembled genomes.

CONCLUSIONS: This resource provides a convenient and powerful way to filter spurious variant calls for clinical or research human samples.

RevDate: 2025-09-04
CmpDate: 2025-09-04

Kupczok A, Gavriilidou A, Paulitz E, et al (2025)

Gene co-occurrence and its association with phage infectivity in bacterial pangenomes.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 380(1934):20240070.

Phages infect bacteria and have recently re-emerged as a promising strategy to combat bacterial infections. However, there is a lack of methods to predict whether and why a particular phage can or cannot infect a bacterial strain based on their genome sequences. Understanding the complex interactions between phages and their bacterial hosts is thus of considerable interest. We recently developed Goldfinder, a phylogenetic method to discover gene co-occurrences across bacterial pangenomes. Here, we expand Goldfinder to infer which gene presences or absences influence bacterial sensitivity to phages. By integrating a bacterial pangenome with an experimentally determined host range matrix, we infer associations between phage infectivity and the presence of accessory genes in bacterial pangenomes. The presented approach can be applied to predict bacterial genes that potentially enable phage infection, bacterial genes that prevent phage infection, and potential interactions between particular bacterial and phage accessory genes. Finally, the predicted interactions are clustered and visualized with the software Cytoscape. Here, we present a method to identify candidate genes within the pool of mobile accessory genes that may contribute to phage-host interactions. This approach will help to set up follow-up experiments and to understand the complex interactions between phages and bacteria.This article is part of the discussion meeting issue 'The ecology and evolution of bacterial immune systems'.

RevDate: 2025-09-01

Steensma MJ, Ducro BJ, Dibbits B, et al (2025)

High-quality, haplotype-resolved reference genomes of the Dutch warmblood horse and Friesian horse using trio binning.

BMC genomics, 26(1):790.

BACKGROUND: In horses, genetic diversity is predominantly observed between breeds, with little variation within breeds. The studbooks of the two largest horse populations in the Netherlands, the Dutch Warmblood horse and Friesian horse population, have ongoing conservation projects including collecting large-scale genotype and sequence data. The current reference genome, derived from a Thoroughbred horse can lead to bias in genetic analyses of other horse breeds. Therefore, the aim of this study was to create high-quality breed-specific reference genomes of Dutch Warmblood and Friesian horses.

RESULTS: We performed nanopore long-read sequencing (R10.4, Q20+) of an F1 cross between a Dutch Warmblood horse and a Friesian horse to create two breed-specific reference genomes by trio binning. This resulted in high-quality, haplotype-resolved reference genomes with contig N50 of 37 and 35 Mb and single copy gene completeness of 99.2 and 99.3% for the Friesian and Warmblood, respectively. The majority of the chromosomes contained telomeric and /or centromeric sequences. The Ensembl gene annotation resulted in 19,750 and 19,872 protein coding genes for the Friesian and Warmblood, respectively. No large chromosomal rearrangements were observed between the Friesian and Warmblood genomes. However, a total of 722 large structural variations (> 10 kb) were identified, of which 14 affect the coding sequence of protein-coding genes.

CONCLUSION: The novel breed-specific reference genomes provide a valuable resource for future genetic analysis and breed conservation efforts and will contribute to ongoing equine pangenome efforts.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-025-11985-0.

RevDate: 2025-09-02

Kolesch F, Sohn M, Rempel A, et al (2025)

SANS ambages: phylogenomics with abundance-filter, multi-threading, and bootstrapping on amino-acid or genomic sequences.

BMC bioinformatics, 26(1):227.

BACKGROUND: The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies- a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable time. SANS is an alignment-free, whole-genome based approach for phylogeny estimation.

RESULTS: Here we present a new implementation SANS ambages with a significantly increased application spectrum. It offers additional types of input data, parallelized processing, and bootstrapping. The source code (C++), documentation, and example data are freely available for download at: https://github.com/gi-bielefeld/sans . SANS can also be launched via the web-interface of the CloWM platform- free of charge, with a standard Life Science account: https://clowm.bi.denbi.de/workflows/0194b78f-9696-7402-a2b8-858508733618/ .

CONCLUSIONS: The new version not only shortens processing time on large datasets immensely by parallelization. Being able to also process amino acid sequences and offering a filter for low-abundant DNA read segments also enables new application cases. Bootstrapping and integrated visualization ease and enrich the interpretation of the resulting phylogenies.

RevDate: 2025-09-02

Du M, Zhang F, Wang X, et al (2025)

Structural and deleterious burdens and their effects on yield traits in foxtail millet domestication.

iScience, 28(9):113295 pii:S2589-0042(25)01556-1.

Crop domestication typically accumulates structural and deleterious variants through genetic bottlenecks and selection hitchhiking. However, the structural and deleterious variant burden has not been investigated in the foxtail millet (Setaria italica). Integrating comparative genomics, pangenomics, population genetics, and quantitative genetics, we identified 6,713 gene gains and 2,802 losses during domestication, affecting flowering time and developmental processes. Population genetics of 333 wild and cultivated accessions revealed 25.76% and 40.40% reductions in structural and deleterious variant burdens in cultivars, potentially reflecting a dramatic loss of genetic diversity of the wild progenitor. Quantitative genetics detected genetic association of yield traits, and essential roles of deleterious and structural variants in the formation of yield traits. In general, this study highlights significant impacts of structural and deleterious variants on yield traits and provides valuable guidelines for molecular breeding of foxtail millet.

RevDate: 2025-09-02

Samano A, Musat M, Junaghare M, et al (2025)

Structural variants are enriched in deleterious visible phenotypes in Drosophila.

bioRxiv : the preprint server for biology pii:2025.08.15.670616.

Genome structural variants (SVs) comprise a sizable portion of functionally important genetic variation in all organisms; yet, many SVs evade discovery using short reads. While long-read sequencing can find the hidden SVs, the role of SVs in variation in organismal traits remains largely unclear. To address this gap, we investigate the molecular basis of 50 classical phenotypes in 11 Drosophila melanogaster strains using highly contiguous de novo genome assemblies generated with Oxford Nanopore long reads. These assemblies enabled the creation of a pangenome graph containing comprehensive, nucleotide-resolution maps of SVs, including complex rearrangements such as the interchromosomal inverted duplication Dp(2;4)eyD and large tandem duplications at the Bar locus. We uncovered new candidate causal mutations for 15 phenotypes and new molecular alleles for 2 mutations comprising tandem duplications, transposable element (TE) insertions, and indels. For example, we mapped the tarsal joint defect Ablp [eyD] to an 8 kb Roo retrotransposon insertion into an intergenic enhancer, a finding validated via CRISPR-Cas9. The wing vein phenotype plexus (px [1]) was linked to a 1.5 kb partial tandem gene duplication, and the century-old Curved (c [1]) wing phenotype was linked to a 7.5 kb DM412 retrotransposon inserted into the coding sequence of the muscle protein gene Strn-Mlck . We also unveiled 8 SV alleles of previously identified causal genes, including previously uncharacterized SVs underlying the extensively studied white and yellow phenotypes. Overall, 67.4% of the genes causing phenotypic changes harbored candidate SVs over 100 bp, whereas only 28% is expected based on euchromatic SVs. Our data, based on the 50 Drosophila phenotypes, 44 of which are strongly deleterious, suggests a disproportionately larger contribution of SVs to deleterious changes in visible phenotypes in Drosophila .

RevDate: 2025-09-02

De Santiago A, Barnes S, Pereira TJ, et al (2025)

Pseudoalteromonas is a novel symbiont of marine invertebrates that exhibits broad patterns of phylosymbiosis.

bioRxiv : the preprint server for biology pii:2025.08.22.671635.

Despite growing insights into the composition of marine invertebrate microbiomes, our understanding of their ecological and evolutionary patterns remains poor, owing to limited sampling depth and low-resolution datasets. Previous studies have provided mixed results when evaluating patterns of phylosymbiosis between marine invertebrates and marine bacteria. Here, we investigated potential animal-microbe symbioses in Pseudoalteromonas, an overlooked bacterial genus consistently identified as a core microbiome taxon in diverse invertebrates. Using a pangenomic analysis of 236 free-living and invertebrate-associated bacterial strains (including two new nematode-associated isolates generated in this study), we confirm that Pseudoalteromonas is a novel symbiont with substantial evidence of phylosymbiosis across at least three marine invertebrate phyla (e.g., Nematoda, Mollusca, and Cnidaria). Patterns of symbiosis were consistent irrespective of geography (including in Antarctica), with FISH images from nematodes indicating that bacterial symbionts form biofilms in the mouth and esophagus. The evolutionary history of Pseudoalteromonas is marked by substantial host-switching and lifestyle transitions, and host-associated genomes suggest that these bacteria are facultative symbionts involved in nutritional mutualisms. In marine environments, we hypothesize that horizontally-acquired symbionts may have co-evolved with invertebrates, using host mucus as a physical niche and food source, while providing their animal hosts with Vitamin B, amino acids, and bioavailable carbon compounds in return.

RevDate: 2025-09-01
CmpDate: 2025-09-02

Laidoudi Y, Davoust B, Lepidi H, et al (2025)

Emergence of the zoonotic bacterium Necropsobacter rosorum in nutria Myocastor coypus with implications for wildlife and human health.

Scientific reports, 15(1):32252.

The nutria (Myocastor coypus), a semi-aquatic rodent native to South America, poses significant ecological and agricultural threats as an invasive species in France, where it continues to proliferate despite sustained control efforts. A fatal case of pneumonia in a nutria from Marseille (France) prompted a microbiological investigation that led to the isolation, taxonomic classification, genomic characterization, and phylogenetic analysis of Necropsobacter rosorum. Whole-genome sequencing of the N. rosorum strain RG01 revealed a genome size of 2,505,657 base pairs and 2303 predicted open reading frames, showing high similarity to other publicly available N. rosorum genomes. Comparative pan-genomic analysis indicated a high level of genomic conservation among N. rosorum strains. The presence of putative virulence factors and a CRISPR-Cas system suggests both pathogenic potential and adaptive defense mechanisms against bacteriophage predation. This study also explored the genetic epidemiology of members of the Pasteurellaceae family, highlighting a considerable overlap between species infecting animals and humans. Among the 408,387 sequence records retrieved from GenBank, 62.1% were deemed suitable for genomic epidemiological analysis. Notably, N. rosorum was underrepresented, with only 13 entries spanning nine countries and three host types, revealing critical gaps in current surveillance and research. Collectively, these findings contribute to a better understanding of the microbiology and epidemiology of N. rosorum and Pasteurellaceae-associated infections, and underscore the importance of integrated, genomics-informed approaches for the monitoring, control, and prevention of zoonotic diseases.

RevDate: 2025-09-01
CmpDate: 2025-09-01

Vuong TD, He G, Hu H, et al (2025)

Identification of new genomic loci for seed protein and oil content in the soybean pangenome using genome-wide association and haplotype analyses.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik, 138(9):237.

The soybean [Glycine max (L.) Merr.] pangenome has been studied and shown to be an invaluable resource for investigating structural variations (SVs), from which different genomic markers were successfully developed and employed for genome-wide association studies (GWAS). Among the SVs markers, gene presence-and-absence variations (PAVs) have been developed in soybean, but have not been widely utilized for association analyses. Here, we reported GWAS and haplotype analysis of seed protein and oil content for two diverse panels, comprised over 500 soybean accessions evaluated in multiple field environments using three marker datasets, whole genome sequence (WGS)-single-nucleotide polymorphisms (SNPs), 50 K-SNPs, and PAVs. The analyses identified new quantitative trait loci (QTL) for protein and oil content, along with the validation of previously reported QTL for these traits. This includes a well-studied QTL on chromosome (Chr.) 20 and another one on Chr. 05 for protein and/or oil. Importantly, this study is the first to report a new genomic locus for both protein and oil mapped to Chr. 08. Gene ontology annotations and expression profiles suggested candidate genes. Further analyses using haplotype-based markers led to the identification of multiple haplotype blocks encompassing candidate genes. Among these, Glyma.05G243400 on Chr. 05 and Glyma.08G109900 and Glyma.08G110000 on Chr. 08 were identified as promising targets. These genes can be incorporated into soybean breeding programs to enhance the selection of desirable protein and oil phenotypes through a haplotype-based breeding approach.

RevDate: 2025-08-30

Grujcic V, Mehrshad M, Vigil-Stenman T, et al (2025)

Stepwise genome evolution from a facultative symbiont to an endosymbiont in the N2-fixing diatom-Richelia symbioses.

Current biology : CB pii:S0960-9822(25)01034-6 [Epub ahead of print].

A few genera of diatoms that form stable partnerships with N2-fixing filamentous cyanobacteria Richelia spp. are widespread in the open ocean. A unique feature of the diatom-Richelia symbioses is the symbiont cellular location spans a continuum of integration (epibiont, periplasmic, and endobiont) that is reflected in the symbiont genome size and content. In this study, we analyzed genomes derived from cultures and environmental metagenome-assembled genomes of Richelia symbionts, focusing on characters indicative of genome evolution. Our results show an enrichment of short-length transposases and pseudogenes in the periplasmic symbiont genomes, suggesting an active and transitionary period in genome evolution. By contrast, genomes of endobionts exhibited fewer transposases and pseudogenes, reflecting advanced stages of genome reduction. Pangenome analyses identified that endobionts streamline their genomes and retain most genes in the core genome, whereas periplasmic symbionts and epibionts maintain larger flexible genomes, indicating higher genomic plasticity compared with the genomes of endobionts. Functional gene comparisons with other N2-fixing cyanobacteria revealed that Richelia endobionts have similar patterns of metabolic loss but are distinguished by the absence of specific pathways (e.g., cytochrome bd ubiquinol oxidase and lipid A) that increase both dependency and direct interactions with their respective hosts. In conclusion, our findings underscore the dynamic nature of genome reduction in N2-fixing cyanobacterial symbionts and demonstrate the diatom-Richelia symbioses as a valuable and rare model to study genome evolution in the transitional stages from a free-living facultative symbiont to a host-dependent endobiont.

RevDate: 2025-08-30
CmpDate: 2025-08-30

Arshad F, Jayaraman S, Talenti A, et al (2025)

A comprehensive water buffalo pangenome reveals extensive structural variation linked to population-specific signatures of selection.

GigaScience, 14:.

BACKGROUND: Water buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization-complicated by the divergent karyotypes of its two subspecies (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.

RESULTS: We present a comprehensive pangenome that integrates 4 newly generated, highly contiguous assemblies of Pakistani river buffalo with 8 publicly available assemblies from both subspecies. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the subspecies to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.

CONCLUSIONS: This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, as well as the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo's genetic potential, improving productivity and resilience in this economically important species.

RevDate: 2025-08-30
CmpDate: 2025-08-30

Lindstrand A, J Eisfeldt (2025)

Hybrid Sequencing Characterization of Complex Chromosomal Rearrangements.

Methods in molecular biology (Clifton, N.J.), 2968:151-159.

Complex chromosomal rearrangements (CCRs), defined as structural variants involving more than two chromosomes or multiple breakpoint junctions, are challenging to resolve, and causal mutations often go unnoticed in genome studies. Short-read whole-genome sequencing enables the characterization of rearrangement junctions in unique sequences. However, issues persist within repetitive regions of the genome, which are prone to rearrangements. Therefore, complementary genome sequencing technologies may be required to solve the structures of CCRs.Hybrid sequencing, which combines multiple genome sequencing datasets from the same individual, results in a more complete representation of the genome. This approach enhances the ability to resolve rearrangement structures and map breakpoint junctions more accurately.

RevDate: 2025-08-30

Lian Q, Jiao WB, Y Wang (2025)

Designing Better Crops with Phased Pangenomes.

Molecular plant pii:S1674-2052(25)00299-0 [Epub ahead of print].

RevDate: 2025-08-29

Li W, Liang H, Sun J, et al (2025)

A Near Telomere-To-Telomere Genome Assembly and Graph-Based Pangenome of Tartary Buckwheat (Fagopyrum tataricum).

Plant biotechnology journal [Epub ahead of print].

RevDate: 2025-08-28

Cheng L, Bao Z, Kong Q, et al (2025)

Genome analyses and breeding of polyploid crops.

Nature plants [Epub ahead of print].

Polyploidization is a common and important evolutionary process in the plant kingdom. Compared with diploid plant species, the intricate genome architecture of polyploid plant species presents substantial challenges in applying multi-omics approaches for crop breeding improvement. In this Review, we summarize the current techniques for analysing polyploid genomes, including constructing reference genomes and pan-genomes, and detecting variants. We also assess findings related to polyploid genome architecture, population genetics and breeding programmes, highlighting advanced techniques in the breeding of polyploid crops. Finally, we explore the challenges and demands posed by polyploid genome complexity during analysis with available biotechnological tools. This Review emphasizes the importance of a comprehensive understanding of polyploid genomic features for the further genetic improvement of polyploid crops.

RevDate: 2025-08-28

Guo L, He Z, H Huo (2025)

Panaln: Indexing pangenome for read alignment.

Bioinformatics (Oxford, England) pii:8242760 [Epub ahead of print].

MOTIVATION: Pangenome indexing is a critical supporting technology in biological sequence analysis such as read alignment applications. The need to accurately identify billions of small sequencing fragments carrying sequencing errors and genomic variants drives the development of scalable and efficient pangenome indexing approach.

RESULTS: We propose a new wavelet tree-based approach, called Panaln, for indexing pangenome and introduce a batch computation approach for fast count query over Panaln. We present a simple and effective seeding strategy and develop a pangenome program that uses the seed-and-extend paradigm for read alignment. Experimental results on simulated and real data demonstrate that Panaln uses significantly less space for the compared pangenome methods with generally higher accuracy. We provide a scalable index construction by representing pangenome with a linear model. Additionally, Panaln brings enhanced accuracy compared to the popular single reference methods.

Package: https://anaconda.org/bioconda/panaln and source code: https://github.com/Lilu-guo/Panaln.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2025-08-28

Skarlatoudi T, Anagnostou GM, Theodorakis V, et al (2025)

Escherichia coli Strains Originating from Raw Sheep Milk, with Special Reference to Their Genomic Characterization, Such as Virulence Factors (VFs) and Antimicrobial Resistance (AMR) Genes, Using Whole-Genome Sequencing (WGS).

Veterinary sciences, 12(8): pii:vetsci12080744.

The objective of this work was to deliver a comprehensive genetic characterization of a collection of E. coli strains isolated from raw sheep milk. To complete our purpose, the technique of whole-genome sequencing, coupled with bioinformatics and phenotypic characterization of antimicrobial resistance, was performed. These Gram-negative, facultative anaerobic bacteria belong to the family Enterobacteriaceae, together with other intestinal pathogens, such as Shigella spp. and Salmonella spp. Genetic analysis was carried out on all strains (phylogram, sequence types, VFs, AMR genes, and pangenome). The results showed the presence of various genetic traits that are related to virulence factors contributing to their pathogenic potential. In addition, genes conferring resistance to antibiotics were also detected and confirmed using phenotypic tests. Finally, the genome of the E. coli strains was characterized by the presence of several mobile genetic elements, thus facilitating the exchange of various genetic elements, associated with virulence and antimicrobial resistance, within and beyond the species, through horizontal gene transfer. Contaminated raw sheep milk with pathogenic E. coli strains is particularly alarming for cheese production in artisan dairies.

RevDate: 2025-08-28
CmpDate: 2025-08-28

Campos-Godínez JF, Villegas-Campos M, JA Molina-Mora (2025)

Core Perturbomes of Escherichia coli and Staphylococcus aureus Using a Machine Learning Approach.

Pathogens (Basel, Switzerland), 14(8): pii:pathogens14080788.

The core perturbome is defined as a central response to multiple disturbances, functioning as a complex molecular network to overcome the disruption of homeostasis under stress conditions, thereby promoting tolerance and survival under stress conditions. Based on the biological and clinical relevance of Escherichia coli and Staphylococcus aureus, we characterized their molecular responses to multiple perturbations. Gene expression data from E. coli (8815 target genes-based on a pangenome-across 132 samples) and S. aureus (3312 target genes across 156 samples) were used. Accordingly, this study aimed to identify and describe the functionality of the core perturbome of these two prokaryotic models using a machine learning approach. For this purpose, feature selection and classification algorithms (KNN, RF and SVM) were implemented to identify a subset of genes as core molecular signatures, distinguishing control and perturbation conditions. After verifying effective dimensional reduction (with median accuracies of 82.6% and 85.1% for E. coli and S. aureus, respectively), a model of molecular interactions and functional enrichment analyses was performed to characterize the selected genes. The core perturbome was composed of 55 genes (including nine hubs) for E. coli and 46 (eight hubs) for S. aureus. Well-defined interactomes were predicted for each model, which are jointly associated with enriched pathways, including energy and macromolecule metabolism, DNA/RNA and protein synthesis and degradation, transcription regulation, virulence factors, and other signaling processes. Taken together, these results may support the identification of potential therapeutic targets and biomarkers of stress responses in future studies.

RevDate: 2025-08-28

Han X, Qiu C, Gai Z, et al (2025)

Pan-Genome-Based Characterization of the PYL Transcription Factor Family in Populus.

Plants (Basel, Switzerland), 14(16): pii:plants14162541.

Abscisic acid (ABA) is a key phytohormone involved in regulating plant growth and responses to environmental stress. As receptors of ABA, pyrabactin resistance 1 (PYR)/PYR1-like (PYL) proteins play a central role in initiating ABA signal transduction. In this study, a total of 30 PopPYL genes were identified and classified into three sub-families (PYL I-III) in the pan-genome of 17 Populus species, through phylogenetic analysis. Among these subfamilies, the PYL I subfamily was the largest, comprising 21 members, whereas PYL III was the smallest, with only four members. To elucidate the evolutionary dynamics of these genes, we conducted synteny and Ka/Ks analyses. Results indicated that most PopPYL genes had undergone purifying selection (Ka/Ks < 1), while a few were subject to positive selection (Ka/Ks > 1). Promoter analysis revealed 258 cis-regulatory elements in the PYL genes of Populus euphratica (EUP) and Populus pruinosa (PRU), including 127 elements responsive to abiotic stress and 33 ABA-related elements. Furthermore, six structural variations (SVs) were detected in PYL_EUP genes and significantly influenced gene expression levels (p < 0.05). To further explore the functional roles of PYL genes, we analyzed tissue-specific expression profiles of 17 PYL_EUP genes under drought stress conditions. PYL6_EUP was predominantly expressed in roots, PYL17_EUP exhibited leaf-specific expression, and PYL1_EUP showed elevated expression in stems. These findings suggest that the drought response of PYL_EUP genes is tissue-specific. Overall, this study highlights the utility of pan-genomics in elucidating gene family evolution and suggests that PYL_EUP genes contribute to the regulation of drought stress responses in EUP, offering valuable genetic resources for functional characterization of PYL genes.

RevDate: 2025-08-28

Goche T, Mavindidze P, T Zenda (2025)

Advances in Functional Genomics for Exploring Abiotic Stress Tolerance Mechanisms in Cereals.

Plants (Basel, Switzerland), 14(16): pii:plants14162459.

Climate change, population growth and the increasing demand for food and nutritional security necessitate the development of climate-resilient cereal crops. This requires first gaining mechanistic insights into the molecular mechanisms underpinning plant abiotic and biotic stress tolerance. Although this is challenging, recent conceptual and technological advances in functional genomics, coupled with computational biology, high-throughput plant phenotyping and artificial intelligence, are now aiding our uncovering of the molecular mechanisms underlying plant stress tolerance. Integrating other innovative approaches such as genome editing, modern plant breeding and synthetic biology facilitates the development of climate-smart cereal crops. Here, we discuss major recent advances in plant functional genomic approaches and techniques such as third-generation sequencing, transcriptomics, pangenomes, genome-wide association studies and epigenomics, which have advanced our understanding of the molecular basis of stress tolerance and development of stress-resilient cereals. Further, we highlight how these genomics approaches are successfully integrated into new plant breeding methods for effective development of stress-tolerant crops. Overall, harnessing these advances and improved knowledge of crop stress tolerance could accelerate development of climate-resilient cereals for global food and nutrition security.

RevDate: 2025-08-28
CmpDate: 2025-08-28

Nunes NB, Castro VS, da Cunha-Neto A, et al (2025)

Integrated Whole-Genome Sequencing and In Silico Characterization of Salmonella Cerro and Schwarzengrund from Brazil.

Genes, 16(8): pii:genes16080880.

BACKGROUND: Salmonella is a bacterium that causes foodborne infections. This study characterized two strains isolated from cheese and beef in Brazil using whole-genome sequencing (WGS).

OBJECTIVES: We evaluated their antimicrobial resistance profiles, virulence factors, plasmid content, serotypes and phylogenetic relationships.

METHODS: DNA was extracted and sequenced on the NovaSeq 6000 platform; the pangenome was assembled using the Roary tool; and the phylogenetic tree was constructed via IQ-TREE.

RESULTS AND DISCUSSION: For contextualization and comparison, 3493 Salmonella genomes of Brazilian origin from NCBI were analyzed. In our isolates, both strains carried the aac(6')-Iaa_1 gene, while only Schwarzengrund harbored the qnrB19_1 gene and the Col440I_1 plasmid. Cerro presented the islands SPI-1, SPI-2, SPI-3, SPI-4, SPI-5 and SPI-9, while Schwarzengrund also possessed SPI-13 and SPI-14. Upon comparison with other Brazilian genomes, we observed that Cerro and Schwarzengrund represented only 0.40% and 2.03% of the national database, respectively. Furthermore, they revealed that Schwarzengrund presented higher levels of antimicrobial resistance, a finding supported by the higher frequency of plasmids in this serovar. Furthermore, national data corroborated our findings that SPI-13 and SPI-14 were absent in Cerro. A virulence analysis revealed distinct profiles: the cdtB and pltABC genes were present in the Schwarzengrund isolates, while the sseK and tldE1 family genes were exclusive to Cerro. The results indicated that the sequenced strains have pathogenic potential but exhibit low levels of antimicrobial resistance compared to national data. The greater diversity of SPIs in Schwarzengrund explains their prevalence and higher virulence potential.

CONCLUSIONS: Finally, the serovars exhibit distinct virulence profiles, which results in different clinical outcomes.

RevDate: 2025-08-28

Yinsai O, Yuantrakul S, Srisithan P, et al (2025)

Genomic Insights into Emerging Multidrug-Resistant Chryseobacterium indologenes Strains: First Report from Thailand.

Antibiotics (Basel, Switzerland), 14(8): pii:antibiotics14080746.

Background: Chryseobacterium indologenes, an environmental bacterium, is increasingly recognized as an emerging nosocomial pathogen, particularly in Asia, and is often characterized by multidrug resistance. Objectives: This study aimed to investigate the genomic features of clinical C. indologenes isolates from Maharaj Nakorn Chiang Mai Hospital, Thailand, to understand their mechanisms of multidrug resistance, virulence factors, and mobile genetic elements (MGEs). Methods: Twelve C. indologenes isolates were identified, and their antibiotic susceptibility profiles were determined. Whole genome sequencing (WGS) was performed using a hybrid approach combining Illumina short-reads and Oxford Nanopore long-reads to generate complete bacterial genomes. The hybrid assembled genomes were subsequently analyzed to detect antimicrobial resistance (AMR) genes, virulence factors, and MGEs. Results: C. indologenes isolates were primarily recovered from urine samples of hospitalized elderly male patients with underlying conditions. These isolates generally exhibited extensive drug resistance, which was subsequently explored and correlated with genomic determinants. With one exception, CMCI13 showed a lower resistance profile (Multidrug resistance, MDR). Genomic analysis revealed isolates with genome sizes of 4.83-5.00 Mb and GC content of 37.15-37.35%. Genomic characterization identified conserved resistance genes (blaIND-2, blaCIA-4, adeF, vanT, and qacG) and various virulence factors. Phylogenetic and pangenome analysis showed 11 isolates clustering closely with Chinese strain 3125, while one isolate (CMCI13) formed a distinct branch. Importantly, each isolate, except CMCI13, harbored a large genomic island (approximately 94-100 kb) carrying significant resistance genes (blaOXA-347, tetX, aadS, and ermF). The absence of this genomic island in CMCI13 correlated with its less resistant phenotype. No plasmids, integrons, or CRISPR-Cas systems were detected in any isolate. Conclusions: This study highlights the alarming emergence of multidrug-resistant C. indologenes in a hospital setting in Thailand. The genomic insights into specific resistance mechanisms, virulence factors, and potential horizontal gene transfer (HGT) events, particularly the association of a large genomic island with the XDR phenotype, underscore the critical need for continuous genomic surveillance to monitor transmission patterns and develop effective treatment strategies for this emerging pathogen.

RevDate: 2025-08-27
CmpDate: 2025-08-27

Pozzi CM, Gaiti A, A Spada (2025)

Climate change and plant genomic plasticity.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik, 138(9):231.

Genome adaptation, driven by mutations, transposable elements, and structural variations, relies on plasticity and instability. This allows populations to evolve, enhance fitness, and adapt to challenges like climate change. Genomes adapt via mutations, transposable elements, DNA structural changes, and epigenetics. Genome plasticity enhances fitness by providing the genetic variation necessary for organisms to adapt their traits and survive, which is especially critical during rapid climate shifts. This plasticity often stems from genome instability, which facilitates significant genomic alterations like duplications or deletions. While potentially harmful initially, these changes increase genetic diversity, aiding adaptation. Major genome reorganizations arise from polyploidization and horizontal gene transfer, both linked to instability. Plasticity and restructuring can modify Quantitative Trait Loci (QTLs), contributing to adaptation. Tools like landscape genomics identify climate-selected regions, resurrection ecology reveals past adaptive responses, and pangenome analysis examines a species' complete gene set. Signatures of past selection include reduced diversity and allele frequency shifts. Gene expression plasticity allows environmental adaptation without genetic change through mechanisms like alternative splicing, tailoring protein function. Co-opted transposable elements also generate genetic and regulatory diversity, contributing to genome evolution. This review consolidates these findings, repositioning genome instability not as a mere source of random error but as a fundamental evolutionary engine that provides the rapid adaptive potential required for plant survival in the face of accelerating climate change.

RevDate: 2025-08-27

Popov IV, Todorov SD, Chikindas ML, et al (2025)

Beyond White-Nose Syndrome: Mitochondrial and Functional Genomics of Pseudogymnoascus destructans.

Journal of fungi (Basel, Switzerland), 11(8):.

White-Nose Syndrome (WNS) has devastated insectivorous bat populations, particularly in North America, leading to severe ecological and economic consequences. Despite extensive research, many aspects of the evolutionary history, mitochondrial genome organization, and metabolic adaptations of its etiological agent, Pseudogymnoascus destructans, remain unexplored. Here, we present a multi-scale genomic analysis integrating pangenome reconstruction, phylogenetic inference, Bayesian divergence dating, comparative mitochondrial genomics, and refined functional annotation. Our divergence dating analysis reveals that P. destructans separated from its Antarctic relatives approximately 141 million years ago, before adapting to bat hibernacula in the Northern Hemisphere. Additionally, our refined functional annotation significantly expands the known functional landscape of P. destructans, revealing an extensive repertoire of previously uncharacterized proteins involved in carbohydrate metabolism and secondary metabolite biosynthesis-key processes that likely contribute to its pathogenic success. By providing new insights into the genomic basis of P. destructans adaptation and pathogenicity, our study refines the evolutionary framework of this fungal pathogen and creates the foundation for future research on WNS mitigation strategies.

RevDate: 2025-08-27
CmpDate: 2025-08-27

Zhu Z, N Stein (2025)

Pangenome insights into structural variation and functional diversification of barley CCT motif genes.

The plant genome, 18(3):e70098.

CONSTANS, CONSTANS-LIKE, TIMING OF CAB EXPRESSION1 (CCT) motif genes play a key role in barley (Hordeum vulgare L.) development and flowering, yet their genetic diversity remains underexplored. Leveraging a barley pangenome (76 genotypes) and pan-transcriptome (subset of 20 genotypes), we examined CCT gene variation and evolutionary dynamics. Motif-based searches, combined with genome assembly validation, revealed annotation limitations and novel frameshift variants (e.g., HvCO10, where Hv is Hordeum vulgare L.), indicating active diversification. Pangenome-wide phylogenetic analysis identified clade-specific domain expansions, including B-box domain additions in HvCO clades. Tissue-specific expression patterns further supported functional divergence among paralogs. Notably, VRN2, a canonical floral repressor associated with winter growth, was retained in spring genotypes, challenging its presumed exclusive role in vernalization. Discrepancies between VRN1 expression, VRN2 deletion, and growth habit implicated additional regulatory mechanisms. These findings highlight the power of pangenomes in resolving gene family complexity, refining annotations, and advancing the understanding of CCT genes to enhance barley resilience and adaptability.

RevDate: 2025-08-26

Kileeg Z, GA Mott (2025)

A species-wide inventory of receptor-like kinases in Arabidopsis thaliana.

BMC biology, 23(1):266 pii:10.1186/s12915-025-02364-y.

BACKGROUND: The receptor-like kinases (RLKs) are the largest family of proteins in plants. Characterized members play critical roles in diverse processes from growth to immunity, and yet the majority do not have a known function. Assigning function to RLKs poses a significant challenge due to the specificity of ligand recognition and because of the often pleiotropic or redundant functions RLKs possess. These problems inhibit the important work of identifying stress-related receptors that may be targets for crop improvement. Identification of stress-related evolutionary signatures can provide a way to expedite the discovery of candidate receptors. Pan-genome analysis can be used to compare naturally occurring variants within a species to identify evolutionary signatures that may otherwise be hidden by using only a single ecotype.

RESULTS: Using 146 ecotypes of Arabidopsis, we generated a pan-RLKome to investigate species-wide natural diversity and identify structural variation and other patterns indicative of stress adaptation. We discovered significant presence/absence variation across a subset of RLKs, most of which occurred in specific subclades nested within receptor subfamilies. These same subclades tended to have arisen through proximal or tandem duplication, both of which are common mechanisms during the expansion of stress-related genes. We also identified strong positive selection across many gene subfamilies and a bias of positive selection in the extracellular domains of receptors. This suggests escape from adaptive conflict within the extracellular domain may have played a large role in the evolution and adaptation of the RLKs.

CONCLUSION: Taken together, this work represents an excellent tool for the comparative study of RLKs and has identified lineages and subclades within RLK subfamilies with the hallmarks of involvement in stress adaptation.

RevDate: 2025-08-26
CmpDate: 2025-08-26

Maurizi L, Musleh L, Brunetti F, et al (2025)

Uropathogenic Escherichia coli (UPEC) that hides its identity: features of LC2 and EC73 strains from recurrent urinary tract infections.

BMC microbiology, 25(1):547.

BACKGROUND: Uropathogenic Escherichia coli (UPEC) strains are the major causative agents of human urinary tract infections (UTIs). Many patients who develop UTIs will experience a recurrent UTI (RUTI) within 6 months despite antibiotic-mediated clearance of the initial infection. A significant proportion of RUTIs are caused by E. coli identical to the original strain. UPEC employs several strategies to adhere, colonize, and persist within the bladder niche. Knowledge about the mechanisms regulating specific host-pathogen interactions that promote bacterial persistence is necessary to develop new approaches to RUTI diagnosis and treatment.

RESULTS: LC2 and EC73 UPEC strains were collected from patients with RUTIs. E. coli CFT073 and K-12 MG1655 were used as reference strains. UPEC displayed phenotypic profiles like those of the general E. coli population. The pan-genome analysis revealed that LC2 harbored many unique genes encoding several different functions such as intracellular trafficking and secretion, and vesicular transport. Contrarily, EC73 was the strain with the lowest number of unique genes involved in replication, recombination, repair and cell wall/membrane/envelope biogenesis. LC2 and EC73 exhibited the capacity to invade bladder monolayers efficiently and to colonize the gut of Caenorhabditis elegans, with LC2 being significantly more virulent than EC73. T24 cells infected with EC73 and LC2 strains exhibited significantly increased mRNA levels of IL-6, IL-8, IL-1β and TNF-α. EC73 elicited the strongest cytokine response. Differently, no significant cytokine mRNA induction was detected in T24 cells infected with E. coli CFT073. LC2 and EC73 modulated the expression of proteins involved in reactive oxygen species (ROS) balance in infected cells, but to different extents.

CONCLUSION: The acquisition of virulence factors by horizontal transfer of accessory DNA, other than being the cause of transformation to pathogenic strains, is responsible for the genomic plasticity. Our findings suggest that a key role in RUTIs could be played by certain bacterial strains that may benefit from peculiar abilities to adapt and potentially develop reservoirs of persistence across different host environments.

RevDate: 2025-08-25

Chaity SC, Hosen MA, Rahman SR, et al (2025)

Genomic characterization and comparative analysis of antibiotic resistance and virulence in Bangladeshi and global Klebsiella pneumoniae ST48 strains.

Journal, genetic engineering & biotechnology, 23(3):100557.

Klebsiella pneumoniae is an opportunistic pathogen associated with nosocomial infections, known for its multidrug resistance (MDR) and biofilm-forming abilities. ST48 is a particularly concerning sequence type and an emerging international clone linked to global spread and MDR infections. This study examines the comprehensive genomic epidemiology of the local and global populations of K. pneumoniae ST48 strains using whole genomes sequence data. We performed phenotypic and genotypic characterization of a K. pneumoniae strain S3C and conducted molecular epidemiological analyses of local ST48 isolates in Bangladesh, followed by pan-genome and phylogenetic analyses of 397 global ST48 strains. The S3C strain was resistant to 17 out of 19 tested antibiotics and was a moderate biofilm former. Whole genome sequencing identified it as ST48 clonal type, with 13 acquired antibiotic resistance genes, 76 virulence-associated genes, and multiple mobile genetic elements. Comparative analysis of Bangladeshi ST48 strains indicated a high prevalence of MDR genes, particularly blaCTX-M-15, and a diverse array of virulence factors associated with biofilm formation, siderophore production, capsular biosynthesis and others. Pan-genome analysis of Bangladeshi ST48 strains revealed 8,030 genes, with 56.26% classified as core genes. In contrast, global ST48 strains had 16,307 genes, with 75.3% as accessory genes, highlighting extensive genomic plasticity. The phylogenetic analysis revealed that isolates from different regions clustered within the major clade, indicating the global dissemination of this sequence type. Our findings underscore the substantial genomic diversity and high resistance levels of K. pneumoniae ST48, emphasizing the need for targeted infection control measures and continuous surveillance.

RevDate: 2025-08-25

Shahed K, Chakma A, Manjur OHB, et al (2025)

Multiscale comparative pathogenomic analysis of Vibrio anguillarum linking serotype diversity, genomic plasticity and pathogenicity.

Journal, genetic engineering & biotechnology, 23(3):100522.

Vibrio anguillarum is a major marine fish pathogen causing high mortality and potential zoonotic risks. Understanding its genomic diversity, virulence factors, and antibiotic resistance is crucial for aquaculture disease management. In this study, a comparative pan-genomic analysis of 16 V. anguillarum strains was conducted to examine core and accessory genome diversity, virulence factors, and antibiotic resistance mechanisms. The phylogenetic analysis was conducted using six core genes and SNPs to evaluate evolutionary relationships and pathogenic traits. The core genome contained 2,038 unique ORFs, while the accessory genome had 5,197 cloud genes, confirming an open pangenome. This study identified 118 pathogenic genomic islands, antibiotic resistance genes (tetracycline, quinolone, and carbapenem), and virulence factors, including type VI secretion system (T6SS) components and RTX toxins (hcp-2, vipB/mglB, rtxC). Core genes such as ftsI uncovered substantial evolutionary divergence among species, identifying more than 150 distinct SNPs. Phylogenetic analysis showed serotype-specific clustering, with O1 strains displaying genetic homogeneity, whereas O2 and O3 exhibited divergence, suggesting distinct evolutionary adaptations influencing pathogenicity and ecological interactions. These findings provide primary insights for developing molecular markers and targeted treatments for aquaculture pathogens.

RevDate: 2025-08-25

Ryan AP, Bergin S, Scully J, et al (2025)

Small pangenome of Candida parapsilosis reflects overall low intraspecific diversity.

mBio [Epub ahead of print].

Candida parapsilosis is an opportunistic yeast pathogen that can cause life-threatening infections in immunocompromised humans. Whole-genome sequencing studies of the species have demonstrated remarkably low diversity, with strains typically differing by about 1.5 single nucleotide polymorphisms (SNPs) per 10 kb. However, SNP calling alone does not capture the full extent of genetic variation. Here, we define the pangenome of 372 C. parapsilosis isolates to determine variation in gene content. The pangenome consists of 5,859 genes, of which 48 are not found in the genome of the reference strain. This includes 5,791 core genes (present in ≥99.5% of isolates). Four genes, including the allantoin permease gene DAL4, were present in all isolates but were truncated in some strains. The truncated DAL4 was classified as a pseudogene in the reference strain CDC317. CRISPR-Cas9 gene editing showed that removing the early stop codon (producing the full-length Dal4 protein) is associated with improved use of allantoin as a sole nitrogen source. We find that the accessory genome of C. parapsilosis consists of 68 homologous clusters. This includes 38 previously annotated genes, 27 novel paralogs of previously annotated genes, and 3 uncharacterized open reading frames. Approximately one-third of the accessory genome (24/68 genes) is associated with gene fusions between tandem genes in the major facilitator superfamily. Additionally, we identified two highly divergent C. parapsilosis strains and found that, despite their increased phylogenetic distance (~30 SNPs per 10 kb), both strains have similar gene content to the other 372.IMPORTANCECandida parapsilosis is a human fungal pathogen listed in the high-priority group by the World Health Organization. It is an increasing cause of hospital-acquired and drug-resistant infections. Here, we studied the genetic diversity of 372 C. parapsilosis isolates, the largest genomic surveillance of this species to date. We show that there is relatively little genetic variation. However, we identified two more distantly related isolates from Germany, suggesting that even more sampling may yield more diversity. We find that the pangenome (the cumulative gene content of all isolates) is surprisingly small, compared to other fungal species. Many of the non-core genes are involved in transport. We also find that variations in gene content are associated with nitrogen metabolism, which may contribute to the virulence characteristics of this species.

RevDate: 2025-08-25

Liu X, Zhang M, Su J, et al (2025)

The evolution, variation, and expression patterns under development and stress responses of the NAC gene family in the barley pan-genome.

Frontiers in plant science, 16:1635416.

The NAC transcription factor family is pivotal in regulating plant development and stress responses, yet its diversity and evolutionary dynamics in barley (Hordeum vulgare) remain underexplored. In this study, we performed a comprehensive pan-genome analysis to identify and characterize the HvNACs across 20 barley accessions. A ranging from 127 to 149 HvNACs were identified in each genome, in which the Morex genome harbored the highest count. These HvNACs were classified into 201 orthogroups, further stratified into core (102), soft-core (18), shell (25), and lineage-specific (56) categories. Phylogenetic analysis delineated them into 12 subfamilies, of which the core genes have undergone strong purifying selection, by contrast, the shell and lineage-specific genes were under relaxed selection constraint, suggesting functional diversification in barley. Genomic variation, such as PAVs and CNVs, largely driven by TEs, highlighted the dynamic nature of NAC loci. Furthermore, transcriptome profiling of the HvNACs demonstrated diverse tissue expression patterns and different response characteristics under salt stress. These findings elucidate the evolutionary and functional dynamics of HvNACs, offering valuable insights for genetic improvement of breeding programs in barley as well as in other crops.

RevDate: 2025-08-23

Jayachandiran S, Suresh R, R Dhamodharan (2025)

Comparative and phylogenomic analysis of Chlamydia pneumoniae reveals unique carbohydrate active enzyme family (GT5) among respiratory isolates.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases pii:S1567-1348(25)00102-9 [Epub ahead of print].

Chlamydia pneumoniae is an obligatory intracellular pathogen found in human and animals. Understanding the genomic diversity is crucial for unravelling its pathogenic mechanisms and transmission dynamics. In this study, 14 complete genomes of C. pneumoniae strains were compared for functional diversity analysis. The koala isolate LPCoLN appears as a phylogenetically distinct showing the fewest accessory genes and the highest incorporation of unique or absent genes among the strains analyzed. Functional annotation indicates that certain metabolic pathways between the LPCoLN and the human respiratory strain AR39 were the same, which is most likely due to phage-associated elements present in AR39. The presence of the GT5 CAZyme family is significantly associated with strains of respiratory origin, suggesting a potential role in respiratory adaptation and pathogenic strategies including tissue colonization, immune evasion, and niche-specific persistence. The strong association between GT5 CAZymes and respiratory-origin strains highlights their potential as diagnostic markers and therapeutic targets.

RevDate: 2025-08-23

Dishuck PC, Munson KM, Lewis AP, et al (2025)

Structural variation, selection, and diversification of the NPIP gene family from the human pangenome.

Cell genomics pii:S2666-979X(25)00233-2 [Epub ahead of print].

The NPIP gene family is among the most positively selected gene families in humans/apes and drives independent duplication in primate lineages. These duplications promote genetic instability, leading to recurrent disease-associated microduplication and microdeletion syndromes. Despite its importance, little is known about its function or variation in humans, as short-read sequencing cannot distinguish high-identity duplications. Using long-read assemblies of 169 human haplotypes, we find extreme variation in the content and organization of NPIP loci. We identify fixed and polymorphic paralogs and observe ongoing positive selection. With long-read RNA sequencing (RNA-seq), we create paralog-specific gene models, the majority of which were not previously documented, and observe paralog-specific tissue specificity. This analysis of an exceptionally dynamic gene family provides candidates for future functional study.

RevDate: 2025-08-23
CmpDate: 2025-08-23

Fouéré C, Costes V, Hozé C, et al (2025)

Genetic regulation of sperm DNA methylation in cattle through meQTL mapping.

BMC genomics, 26(1):771.

BACKGROUND: DNA methylation (DNAm) plays an important functional role and is influenced by genetic variants known as methylation QTLs (meQTLs). The majority of meQTL studies have been conducted in human blood. Despite its unique landscape, the genetic regulation of sperm DNAm remains largely unexplored. In this study, we leveraged DNAm measured in sperm from 405 Holstein bulls using reduced representation bisulfite sequencing (RRBS) and performed sequence-level genome-wide association studies for 166,985 variable CpGs (s.d. >5%). We reported heritability estimates and have mapped both cis-meQTLs and trans-meQTLs.

RESULTS: Heritability estimates ranged from 0 to 1 and averaged 0.26 across all selected CpGs, with 76% of estimates above 0.1. The meQTL mapping revealed that 32.9% of the CpGs had a cis-meQTL, 3.6% had a trans-meQTL and 1.0% had both cis- and trans-meQTLs. The cis-CpGs were located on average 261 kb (absolute mean) from their cis-meQTL top SNPs (defined by the most significant association). MeQTLs were enriched in featured genomic annotations, including regions surrounding transcription start sites and ATAC-seq peaks. We also identified spurious trans-associations by analyzing data across multiple genome assemblies, including the construction of a partial pangenome. Additionally, eight trans-meQTL hotspots, defined as variants associated with at least 30 trans-CpGs, were identified and overlapped with genes involved in epigenetic regulation. Using peripheral blood mononuclear cell DNAm from 54 out of the 405 bulls, we did not observe a similar effect of the trans-meQTL hotspots to that one observed in sperm.

CONCLUSIONS: For the first time, meQTLs have been detected and characterized in bovine sperm, contributing to a better understanding of the transmission of paternally inherited DNAm marks. These findings provide useful information for further research aimed at integrating epigenetic information into the prediction of performance traits.

RevDate: 2025-08-22

Le MH, Proctor M, JP Huang (2025)

Chromosome Level Genome Assembly of Dynastes reidi Reveals Structural Evolution of Autosomes and the Sex Chromosomes in Hercules Beetles.

G3 (Bethesda, Md.) pii:8239991 [Epub ahead of print].

The Hercules beetles have long been iconic symbols of evolutionary diversification, sexual selection, and systematics. Despite their rapid phenotypic evolution and a rich history of inspiring evolutionary biologists, genomic resources for these charismatic beetles remain limited, especially for the Giant Hercules beetles. We present the first chromosome-level genome assembly of a Giant Hercules beetle from the Lesser Antilles. The assembled genome is approximately 837 Mb in size, with a scaffold N50 of 66.68 Mb, which can be anchored to 11 pseudochromosomes with a BUSCO completeness score of 95.9%. An estimate of 55.5% of the genome can be attributed to repetitive elements. Additionally, we detected candidate sex-linked chromosomes by comparing sequencing read depths between one male and two females using Illumina short reads. The chromosome-level genome assembly of Dynastes reidi not only provides critical insights into evolutionary and functional genomics, but also supports informed conservation and management efforts. In addition, this genomic resource will enable future pangenome analyses aimed at understanding the genetic basis of species divergence and morphological innovation in beetles. Our study also marks the emergence of a new model system to investigate the origin and diversification of phenotypic novelty by leveraging genomic resources across diverse domesticated beetle breeds.

RevDate: 2025-08-22
CmpDate: 2025-08-22

Gonzalez-Reyes M, Ramos-Tapia I, JA Ugalde (2025)

A global perspective on the genomics of Moraxella catarrhalis.

Microbial genomics, 11(8):.

Moraxella catarrhalis is an opportunistic pathogen of the human respiratory tract, primarily associated with otitis media in children and exacerbations of chronic obstructive pulmonary disease in adults. Despite its clinical importance, the genomic diversity and functional specialization of M. catarrhalis remain insufficiently characterized. This study aimed to analyse the global genetic diversity of M. catarrhalis using whole-genome sequencing to identify phylogenetic lineages, antimicrobial resistance patterns and key virulence factors. Phylogenomic analysis of 345 publicly available genomes identified 3 phylogroups, of which 1 exhibited significant genomic divergence and was excluded from further analyses due to its potential classification as a separate species. The remaining two phylogroups corresponded to previously described seroresistant and serosensitive lineages. Phylogroup B exhibited a higher prevalence of antimicrobial resistance genes, particularly bro-1 and bro-2, while phylogroup A exhibited unique metabolic adaptation, including genes encoding for the DppB-DppC-DppD dipeptide transport system. Both phylogroups shared crucial virulence factors, including UspA1 and UspA2, which facilitate adhesion and immune evasion. Potential therapeutic targets were identified, including PilQ, essential for type IV pilus biogenesis, and CopB, which plays a key role in iron acquisition and immune evasion. Overall, these findings highlight the significance of phylogenomics approaches in elucidating the genetic mechanisms underlying pathogenicity and resistance in M. catarrhalis, providing insights for future therapeutic and preventive strategies.

RevDate: 2025-08-21

Chandra G, Hossen MH, Scholz S, et al (2025)

Pangenome-based genome inference using integer programming.

Genome research pii:gr.280567.125 [Epub ahead of print].

Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance. Recent works highlight the advantage of haplotype-resolved pangenome graphs in addressing these challenges. Building on these developments, we propose a rigorous alignment-free genotyping method. Our optimization framework identifies a path through the pangenome graph that maximizes the matches between the path and substrings of sequencing reads (e.g., k-mers) while minimizing recombination events (haplotype switches) along the path. We prove that this problem is NP-Hard and develop efficient integer-programming solutions. We benchmarked the algorithm using downsampled short-read datasets from homozygous human cell lines with coverage ranging from 0.1× to 10×. Our algorithm accurately estimates complete major histocompatibility complex (MHC) haplotype sequences with small edit distances from the ground-truth sequences, providing a significant advantage over existing methods on low-coverage inputs. While this algorithm is designed for haploid genomes, we discuss directions for extending it to diploid genotyping.

RevDate: 2025-08-20

Zhang H, Liu N, Wang Y, et al (2025)

Super-pangenome analyses across 35 accessions of 23 Avena species highlight their complex evolutionary history and extensive genomic diversity.

Nature genetics [Epub ahead of print].

Common oat, belonging to the genus Avena with 30 recognized species, is a nutritionally important cereal crop and high-quality forage worldwide. Here, we construct a genus-level super-pangenome of Avena comprising 35 high-quality genomes from 14 cultivated oat accessions and 21 wild species. The fully resolved phylogenomic analysis unveils the origin and evolutionary scenario of Avena species, and the super-pangenome analysis identifies 26.62% and 59.93% specific genes and haplotypes in wild species. We delineate the landscape of structural variations (SVs) and the transcriptome profile based 1,401 RNA-sequencing (RNA-seq) samples from diverse abiotic stress treatments in oat. We highlight the crucial role of SVs in modulating gene expression and shaping adaptation to diverse stresses. Further combining SV-based genome-wide association studies (GWASs), we characterize 13 candidate genes associated with drought resistance such as AsARF7, validated by transgenic oat lines. Our study provides unprecedented genomic resources to facilitate genomic, evolution and molecular breeding research in oat.

RevDate: 2025-08-20
CmpDate: 2025-08-20

Han C, Lu S, Hu P, et al (2025)

Machine learning based on pangenome-wide association studies reveals the impact of host source on the zoonotic potential of closely related bacterial pathogens.

Communications biology, 8(1):1253.

Variations in host species significantly impact bacterial growth traits and antibiotic resistance, making it essential to consider host origin when evaluating the zoonotic potential of pathogens. This study focuses on multiple Brucella species, which share highly similar genetic material, to explore the relationship between host origin and zoonotic potential by integrating pan-genome-wide association studies (pan-GWAS) with machine learning (ML). Our results present an open pangenome of Brucella spp. derived from the whole-genome sequencing (WGS) data of 991 strains and identify 268 genes potentially associated with the zoonotic potential of Brucella. Integrating these genes into an ML model based on the support vector machine (SVM) algorithm allows us to predict the zoonotic potential of various Brucella strains with high accuracy. Our findings reveal that zoonotic potential varies by host origin: Brucella melitensis strains isolated from humans exhibit higher zoonotic potential than those isolated from cattle, goats, and sheep, while Brucella suis biovar 2 strains isolated from domestic pigs display higher zoonotic potential than those isolated from wild boars. Our study proposes a method for predicting and quantifying the zoonotic potential of closely related bacterial pathogens from different host origins, providing valuable insights for risk assessment and public health strategy.

RevDate: 2025-08-19

Igolkina AA, Vorbrugg S, Rabanal FA, et al (2025)

A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism.

Nature genetics [Epub ahead of print].

Making sense of whole-genome polymorphism data is challenging, but it is essential for overcoming the biases in SNP data. Here we analyze 27 genomes of Arabidopsis thaliana to illustrate these issues. Genome size variation is mostly due to tandem repeat regions that are difficult to assemble. However, while the rest of the genome varies little in length, it is full of structural variants, mostly due to transposon insertions. Because of this, the pangenome coordinate system grows rapidly with sample size and ultimately becomes 70% larger than the size of any single genome, even for n = 27. Finally, we show how short-read data are biased by read mapping. SNP calling is biased by the choice of reference genome, and both transcriptome and methylome profiling results are affected by mapping reads to a reference genome rather than to the genome of the assayed individual.

RevDate: 2025-08-19

Sibbald SJ, Lawton M, Maclean C, et al (2025)

Pangenome biology and evolution in harmful algal-bloom-forming pelagophytes.

Current biology : CB pii:S0960-9822(25)00964-9 [Epub ahead of print].

In prokaryotes, lateral gene transfer (LGT) is a key mechanism leading to intraspecies variability in gene content and the phenomenon of pangenomes. In microbial eukaryotes, however, the extent to which LGT-driven pangenomes exist is unclear. Pelagophytes are ecologically important marine algae that include Aureococcus anophagefferens-a species notorious for causing harmful algal blooms. To investigate genome evolution across Pelagophyceae and within Ac. anophagefferens, we used long-read sequencing to produce high-quality genome assemblies for five strains of Ac. anophagefferens (52-54 megabase pairs [Mbp]), a telomere-to-telomere assembly for Pelagomonas calceolata (32 Mbp), and the first reference genome for Aureoumbra lagunensis (41 Mbp). Using comparative genomics and phylogenetics, we show remarkable strain-level genetic variation in Ac. anophagefferens, with a pangenome (23,356 orthogroups) that is 81.1% core and 18.9% accessory. Although gene content variation within Ac. anophagefferens does not appear to be largely driven by recent prokaryotic LGTs (2.6% of accessory orthogroups), 368 orthogroups were acquired from bacteria in a common ancestor of all analyzed strains and are not found in P. calceolata or Au. lagunensis. A total of 1,077 recent LGTs from prokaryotes and viruses were identified within Pelagophyceae overall, constituting 3.5%-4.0% of the orthogroups in each species. This includes genes likely contributing to the ecological success of pelagophytes globally and in long-lasting harmful blooms.

RevDate: 2025-08-19

Dai S, Zhao P, Li W, et al (2025)

Global pangenome analysis highlights the critical role of structural variants in cattle improvement and identifies a unique event as a novel enhancer in IGFBP7[+] cells.

Molecular biology and evolution pii:8238201 [Epub ahead of print].

Based on a pangenome graph platform, we simultaneously analyzed the impacts of SNPs and SVs in the population structure and phenotypic formation of global cattle using 2,409 individuals from 82 breeds. We demonstrated that SVs, like SNPs, effectively explain the population structure of global cattle. Genomic regions under strong selection, identified using both SNPs and SVs, consistently revealed footprints associated with human-mediated selection of economic traits in European improved cattle or natural selection of geographical adaptations. Notably, we detected that ∼40.14% of SVs were not tagged (LD, r2 < 0.6) by nearby SNPs. These "orphan" SVs may uncover new genetic signals and represent recent mutations associated with specific selection pressures or local environmental adaptation. Selected SVs tagged by SNPs also play causal or dominant roles in regions under selection. For example, our single-cell RNA sequencing has demonstrated that a notable SNP-tagged SV functions as an enhancer of the IGFBP7 gene, regulating fat deposition through IGFBP7+ cells. In conclusion, these SV-related mechanisms likely have caused some differences in economic traits and local adaptability across global cattle populations. Our integrated approaches highlight the unique and indispensable roles of SVs in shaping genetic diversity, offering novel insights into adaptation, selection, and strategies for improving cattle populations.

RevDate: 2025-08-19

Kenney SM, M'ikanatha NM, E Ganda (2025)

Genomic evolution of Salmonella Dublin in cattle and humans in the United States.

Applied and environmental microbiology [Epub ahead of print].

Increasingly, antimicrobial-resistant (AMR) Salmonella Dublin is a threat to human and animal health, therefore requiring a One Health approach to comprehensively understand pathogen evolution. Moreover, S. Dublin dissemination throughout the United States and the food supply chain is a concern for food safety and security. Here, we leveraged multi-agency biosurveillance data and genomic sequencing of S. Dublin strains to provide a robust analysis of its evolution across human, animal, and environmental reservoirs. This study advances our understanding of AMR S. Dublin, elucidates factors driving AMR emergence, and informs interventions to protect public health. In total, 2,150 strains collected between 2002 and 2023 throughout the United States from clinical bovine (N = 581), clinical human (N = 664), and environmental (N = 905) sources were identified. After uniform quality control, raw reads were assembled de novo followed by genome annotation and characterization of plasmids, antimicrobial resistance genes, and virulence factors. Strain relatedness was evaluated using a core genome maximum-likelihood phylogeny and pairwise core genome single-nucleotide polymorphism (SNP) differences. We identified the highest prevalence of drug-specific antimicrobial resistance genes and multidrug resistance plasmid, IncA/C2 (P < 0.001), in bovine clinical strains, which also had the greatest genetic diversity. Despite source-dependent differences in antimicrobial resistance gene frequency and types, 72% of S. Dublin strains in our study differed with at least one other strain by 20 or fewer SNPs. This high degree of genomic similarity highlights the potential for cross-transmission between humans, animals, and the environment and underscores the importance of considering strain source when assessing and monitoring antimicrobial resistance.IMPORTANCESalmonella Dublin is a zoonotic, sometimes foodborne, pathogen that causes severe illness in cattle and humans. Our study takes a One Health approach to understanding genetic differences in strains within and between different reservoirs in the United States. We identified differences in antimicrobial resistance potential and genome content between clinical bovine, clinical human, and environmental strains. Nonetheless, the U.S. population of S. Dublin is highly related and diverges minimally over time and geography. These findings highlight the importance of the One Health framework when combating zoonotic antimicrobial-resistant pathogens like Salmonella Dublin.

RevDate: 2025-08-14

Alanko JN, Biagi E, SJ Puglisi (2025)

Finimizers: Variable-Length Bounded-Frequency Minimizers for $k$-mer Sets.

IEEE transactions on computational biology and bioinformatics, 22(2):899-910.

The minimizer of a $k$-mer is the smallest $m$-mer inside the $k$-mer according to some total order $< $ of the $m$-mers. Minimizers are often used as keys in hash tables in indexing tasks in metagenomics and pangenomics. The main weakness of minimizer-based indexing is the possibility of very frequently occurring minimizers, which can slow query times down significantly. Popular minimizer alignment tools employ various and often wild heuristics as workarounds, typically by ignoring frequent minimizers or blacklisting commonly occurring patterns, to the detriment of other metrics (e.g., alignment recall, space usage, or code complexity). In this paper, we introduce frequency-bounded minimizers, which we call finimizers, for indexing sets of $k$-mers. The idea is to use an order relation $< $ for minimizer comparison that depends on the frequency of the minimizers within the indexed $k$-mers. With finimizers, the length $m$ of the $m$-mers is not fixed, but is allowed to vary depending on the context, so that the length can increase to bring the frequency down below a user-specified threshold $t$. Setting a maximum frequency solves the issue of very frequent minimizers and gives us a worst-case guarantee for the query time. We show how to implement a particular finimizer scheme efficiently using the Spectral Burrows-Wheeler transform ($SBWT$) (Alanko et al. Proc. SIAM ACDA, 2023) augmented with longest common suffix information. In experiments, we explore in detail the special case in which we set $t = 1$. This choice simplifies the index structure and makes the scheme completely parameter-free apart from the choice of $k$. A prototype implementation of this scheme exhibits $k$-mer localization times close to, and often faster than, state-of-the-art minimizer-based schemes.

RevDate: 2025-08-13

Pushkarna S, Kumar A, Arora K, et al (2025)

Exploring the potential of Lactobacillus rhamnosus as gluten-digesting bacteria.

Irish journal of medical science [Epub ahead of print].

BACKGROUND: Celiac disease (CeD), a multifactorial disorder, develops when gluten, the toxic environmental inducer, interacts with CeD susceptibility genetic markers, resulting in a chronic enteropathy. Several extra-intestinal complications may also arise in cases of delayed management. There persists a growing demand to develop non-dietary adjuvant therapeutic options that can help relieve symptoms and improve patients' quality of life.

AIM: The present study conducted a bioinformatic approach to look into the potential of using Lactobacillus rhamnosus, a well-established probiotic, as gluten-digesting bacteria and provide the basis for future therapeutic developments.

METHODS: Complete genome assemblies of forty-nine L. rhamnosus strains were subjected to annotation using RAST and a pan genome analysis with BPGA. Genes for peptidases were identified using BlastKOALA and Prokka, followed by domain analysis using the NCBI-CD search tool to screen for gluten-digesting activity.

RESULTS: Genome annotation of all the strains under study highlighted the presence of sixty-one peptidases in L. rhamnosus. Domain analysis further revealed that nine of these peptidases, including aminopeptidase N, neutral endopeptidase, oligoendopeptidase F, dipeptidyl-peptidase 5, proline iminopeptidase, Xaa-Pro dipeptidyl-peptidase, aminopeptidase C, aminopeptidase E, and PII-type proteinase, shared domains with already established gluten-digesting enzymes, suggesting their potential role in degrading toxic gliadin peptides.

CONCLUSION: The current in silico analysis indicates that this well-known probiotic species, in addition to showcasing a plethora of beneficial properties, may also hold great potential in terms of reducing gluten toxicity. With further studies, L. rhamnosus can prove to be a promising candidate in CeD treatment and management.

RevDate: 2025-08-14

Wei CR, Basharat Z, P Adhikari (2025)

Implications of virtual screening for South African natural compounds against Plesiomonas shigelloides, a pathogen with zoonotic potential.

Computers in biology and medicine, 196(Pt B):110882.

Plesiomonas shigelloides is an emerging pathogen associated with gastroenteritis and poses a growing public health concern, especially in regions with limited access to advanced medical treatments. The purpose of this study was to explore the therapeutic potential of South African natural product compounds against P. shigelloides by targeting the essential enzyme Pyridoxine 5'-phosphate synthase or PPS (encoded by PdxJ). P. shigelloides proteomes (n = 26) were processed using the Bacterial Pan Genome Analysis (BPGA) pipeline to identify conserved targets. Targeting conserved protein ensures the potential for broad-spectrum efficacy. PPS was chosen as drug target and its structure was predicted using AlphaFold, enabling high-confidence modeling. Subsequently, docking was performed using AutoDock Vina, focusing on a library of South African compounds (n > 1000). The three inhibitors demonstrating strong binding affinities to the PPS were Scutiaquinone A, Mesquitol-(4α→5)-3,3',4',7,8-pentahydroxyflavonone, and Riccardin C. To further validate the stability and efficacy of these interactions, molecular dynamics (MD) simulations were carried out for 100 ns. The simulations revealed stable interactions between the inhibitors and PPS, suggesting potential inhibition of the PPS enzyme. Mesquitol derivative was found to be the safest and recommended for further experimental validation. This study highlights the promising potential of South African natural compounds in combating P. shigelloides infections, paving the way for the development of novel therapeutic strategies.

RevDate: 2024-03-13
CmpDate: 2024-03-01

Deery J, Carmody M, Flavin R, et al (2024)

Comparative genomics reveals distinct diversification patterns among LysR-type transcriptional regulators in the ESKAPE pathogen Pseudomonas aeruginosa.

Microbial genomics, 10(2):.

Pseudomonas aeruginosa, a harmful nosocomial pathogen associated with cystic fibrosis and burn wounds, encodes for a large number of LysR-type transcriptional regulator proteins. To understand how and why LTTR proteins evolved with such frequency and to establish whether any relationships exist within the distribution we set out to identify the patterns underpinning LTTR distribution in P. aeruginosa and to uncover cluster-based relationships within the pangenome. Comparative genomic studies revealed that in the JGI IMG database alone ~86 000 LTTRs are present across the sequenced genomes (n=699). They are widely distributed across the species, with core LTTRs present in >93 % of the genomes and accessory LTTRs present in <7 %. Analysis showed that subsets of core LTTRs can be classified as either variable (typically specific to P. aeruginosa) or conserved (and found to be distributed in other Pseudomonas species). Extending the analysis to the more extensive Pseudomonas database, PA14 rooted analysis confirmed the diversification patterns and revealed PqsR, the receptor for the Pseudomonas quinolone signal (PQS) and 2-heptyl-4-quinolone (HHQ) quorum-sensing signals, to be amongst the most variable in the dataset. Successful complementation of the PAO1 pqsR [-] mutant using representative variant pqsR sequences suggests a degree of structural promiscuity within the most variable of LTTRs, several of which play a prominent role in signalling and communication. These findings provide a new insight into the diversification of LTTR proteins within the P. aeruginosa species and suggests a functional significance to the cluster, conservation and distribution patterns identified.

RevDate: 2025-08-14
CmpDate: 2023-11-16

Bouznada K, Belaouni HA, A Meklat (2023)

Genome-based reclassification of Kitasatospora niigatensis as a later heterotypic synonym of Kitasatospora cineracea Tajima et al. (2001).

Antonie van Leeuwenhoek, 116(12):1327-1335.

The present study used genome-based approaches to investigate the taxonomic relationship between Kitasatospora cineracea DSM 44780[T] and Kitasatospora niigatensis DSM 44781[T], two species that were previously described by Tajima et al. (Int J Syst Evol Microbiol 51:1765-1771, 2001). The digital DNA-DNA hybridization (dDDH), average amino acid identity (AAI), and average nucleotide identity (ANI) values between the genomes of the two type strains were 90.3, 98.7, and 99.1%, respectively. These values exceeded the established thresholds of 70% (dDDH) and 95-96% (ANI and AAI) for bacterial species delineation, suggesting that K. cineracea and K. niigatensis should share the same taxonomic position. Furthermore, our analysis using the 'Bacterial Pan Genome Analysis' (BPGA) pipeline and the Maximum Likelihood core-genes tree inferred using FastTree2 consistently demonstrated that K. cineracea DSM 44780[T] and K. niigatensis DSM 44781[T] are closely related, as indicated by the clustering of these strains in the core-genes phylogenomic tree. Based on these findings, we propose that K. niigatensis should be considered a later heterotypic synonym of K. cineracea.

RevDate: 2023-11-10
CmpDate: 2015-05-28

Islam MA, Waller AS, Hug LA, et al (2014)

New insights into Dehalococcoides mccartyi metabolism from a reconstructed metabolic network-based systems-level analysis of D. mccartyi transcriptomes.

PloS one, 9(4):e94808.

Organohalide respiration, mediated by Dehalococcoides mccartyi, is a useful bioremediation process that transforms ground water pollutants and known human carcinogens such as trichloroethene and vinyl chloride into benign ethenes. Successful application of this process depends on the fundamental understanding of the respiration and metabolism of D. mccartyi. Reductive dehalogenases, encoded by rdhA genes of these anaerobic bacteria, exclusively catalyze organohalide respiration and drive metabolism. To better elucidate D. mccartyi metabolism and physiology, we analyzed available transcriptomic data for a pure isolate (Dehalococcoides mccartyi strain 195) and a mixed microbial consortium (KB-1) using the previously developed pan-genome-scale reconstructed metabolic network of D. mccartyi. The transcriptomic data, together with available proteomic data helped confirm transcription and expression of the majority genes in D. mccartyi genomes. A composite genome of two highly similar D. mccartyi strains (KB-1 Dhc) from the KB-1 metagenome sequence was constructed, and operon prediction was conducted for this composite genome and other single genomes. This operon analysis, together with the quality threshold clustering analysis of transcriptomic data helped generate experimentally testable hypotheses regarding the function of a number of hypothetical proteins and the poorly understood mechanism of energy conservation in D. mccartyi. We also identified functionally enriched important clusters (13 for strain 195 and 11 for KB-1 Dhc) of co-expressed metabolic genes using information from the reconstructed metabolic network. This analysis highlighted some metabolic genes and processes, including lipid metabolism, energy metabolism, and transport that potentially play important roles in organohalide respiration. Overall, this study shows the importance of an organism's metabolic reconstruction in analyzing various "omics" data to obtain improved understanding of the metabolism and physiology of the organism.

RevDate: 2025-08-18

Näpflin N, Schubert C, Malfertheiner L, et al (2025)

Gene-level analysis of core carbohydrate metabolism across the Enterobacteriaceae pan-genome.

Communications biology, 8(1):1241.

Enterobacteriaceae is a diverse bacterial family that commonly colonizes the gastrointestinal tracts of humans and animals, influences host health, and also includes members adapted to colonize the phyllosphere as well as insect hosts. We lack systematic knowledge regarding the core metabolic strategy shared among Enterobacteriaceae. To address this gap, we have analyzed the pan-genome of nearly 20,000 genomes, including Citrobacter, Escherichia, Klebsiella, and Salmonella. We found that genes necessary for monosaccharide-fuelled mixed acid fermentation and (micro-)aerobic respiration are part of the Enterobacteriaceae core genome, whereas most genes involved in anaerobic respiration and carbohydrate utilization are associated to the accessory genome. Most Enterobacteriaceae possess genes enabling the utilization of D-glucose, its epimers, D-glucose-containing disaccharides, and chemically modified derivatives of D-glucose - highlighting the evolutionary adaptation of this family to efficiently exploit this simple sugar. Understanding Enterobacteriaceae's core metabolic strategy helps clarify the distinction of niche-defining nutrient sources, which can be genus-, species- or strain-specific. This study highlights the core metabolic strategy of Enterobacteriaceae, supporting the development of targeted interventions in microbiome research and infectious disease control.

RevDate: 2025-08-18

Rosani U, Gerdol M, M Krupovic (2025)

The highly dynamic pangenome of basal chordates is enriched in defence and immunity genes and is inherited following the Mendelian law.

PLoS genetics, 21(8):e1011833 pii:PGENETICS-D-25-00112 [Epub ahead of print].

Pangenome analyses, which encompass the full genetic repertoire of a species, offer valuable insights into intraspecific diversity and phylogeographic gene patterns. While the taxonomic breadth and functional significance of animal pangenomes remain to be fully uncovered, recent findings (such as reports of open, bacterial-like pangenomes in bivalves) highlight the need to better understand the molecular mechanisms driving inter-haplotype structural variation. Genes affected by presence-absence variation (PAV), along with non-reference sequences (NRSs), represent evolutionary footprints that may shape genome architecture and plasticity, ultimately influencing the adaptability and long-term fitness of species. To investigate the pangenomic architecture of basal chordates, we analyzed available whole-genome resequencing data from Branchiostoma belcheri and B. floridae, examined the impact of structural genomic variation, and assessed the inheritance patterns of dispensable genes across generations. The pangenomes of both species include over a thousand of genes affected by PAV and exhibiting trans-generational Mendelian transmission from parents to offspring. We further demonstrate that 35 dispensable genes in B. belcheri are of exogenous origin, likely resulting from the integration of a malacoherpesvirus genome, thereby extending the known host range of Malacoherpesviridae from invertebrates to chordates. PAV preferentially targeted gene families involved in defense, immunity, and cell signaling, including GTPases of immunity-associated proteins (GIMAPs), caspases, toll-like receptors, and pattern recognition receptors containing apextrin C-terminal (APEC) domains. The dynamic nature of immunity genes in cephalochordates parallels patterns seen in open bacterial pangenomes, suggesting that fundamental principles of genome evolution and innovation across life domains are shaped by host-pathogen interactions.

RevDate: 2025-08-18

Sadler MC, Wietz M, Mino S, et al (2025)

Genomic diversity and adaptation in Arctic marine bacteria.

mBio [Epub ahead of print].

Arctic marine bacteria experience seasonal changes in temperature, salinity, light, and sea ice cover. Time-series and metagenomic studies have identified spatiotemporal patterns in Arctic microbial communities, but a lack of complete genomes has limited efforts to identify the extent of genomic diversity in Arctic populations. We cultured and sequenced the complete genomes of 34 Arctic marine bacteria to identify patterns of gene gain, loss, and rearrangement that structure genomes and underlie adaptations to Arctic conditions. We found that the most abundant lineage in the Arctic (SAR11) is comprised of diverse species and subspecies, each encoding 50-150 unique genes. Half of the 16 SAR11 genomes harbor a genomic island with the potential to enhance survival in the Arctic by utilizing the osmoprotectant and potential methyl donor glycine betaine. We also cultured and sequenced four species representing an uncultured family of Pseudomonadales, four subspecies of Pseudothioglobus (SUP05), a genus of high GC Puniceispirillales (SAR116), and a family of low GC SAR116. Time-series 16S rRNA amplicon data indicate that this culture collection represents up to 60% of the marine bacterial community in Arctic waters. Their genomes provide insights into the evolutionary processes that underlie bacterial diversity and adaptation to Arctic waters.IMPORTANCEGenetic diversity has limited efforts to assemble and compare whole genomes from natural populations of marine bacteria. We developed a cultivation-based population genomics approach to culture and sequence the complete genomes of bacteria from the Arctic Ocean. Cultures and closed genomes obtained in this study represent previously uncultured families, genera, and species from the most abundant lineages of bacteria in the Arctic. We report patterns of gene gain, loss, rearrangement, and adaptation in the dominant lineage (SAR11), as well as the size, composition, and structure of genomes from several other groups of marine bacteria. This work demonstrates the potential for cultivation-based high-throughput genomics to enhance understanding of the processes underlying genomic diversity and adaptation.

RevDate: 2025-08-18

Henry JA (2024)

Population health management genomic new-born screens and multi-omics intercepts.

Frontiers in artificial intelligence, 7:1496942.

INTRODUCTION: The Population Health Management (PHM) Genomic Newborn Screens (GNBS) and Multi-Omics Intercepts for Human Phenotype Ontology (HPO) using Federated Data Platforms (FDP) represent a groundbreaking innovation in global health. This reform, supported by the UK's Genomic Medical Services (GMS) through "The Generation Study," aims to significantly reduce infant mortality by identifying and managing over 200 rare diseases from birth, paving the way for personalised health planning.

METHODS: Using an ecosystem approach, this study evaluates a diverse pangenome to predict health outcomes or confirm diagnoses prior to symptomatic manifestations. GNBS standardises care by integrating diagnostic techniques such as blood spot analysis and full blood cell diagnostics to stratify risk. The approach enhances the understanding of rare diseases in primary care medicine, with biomedical and haematology diagnoses re-evaluated. Scientific proof of concept and fit-for-purpose technology align multi-omics in pre-eXams (X = Gen AI).

RECOMMENDATIONS: The Digital Regulation Service (DRS) assembles an agile group of experts to enhance medical science through human phenotype ontology (HPO) for precise disease segmentation, scheduling accurate eXam intercepts where needed. This team strategically plans regulation services for digital HPO eXam assurance and implements Higher Expert Medical Science Safety (HEMSS) frameworks. The DRS is responsible for overseeing gene, oligonucleotide, and recombinant protein intercepts; commissioning blood pathology HPO eXam intercepts; and monitoring preliminary eXams with advanced imaging techniques.

DISCUSSION: In pursuit of excellence in PHM of HPO, HEMSS with Agile Group Development leverages the Genomic Newborn Screens (GNBS) and multi-omics to create personalised health plans integrated with NHS England Genomics and AI-driven DRS. The discourse extends to examining GNBS predictors and intercepts, focusing on their impact on public health and patient safety. Discussions encompass structured HPO knowledge addressing newborn health, ethical considerations, family privacy, and the benefits and limitations of pre-eXam screenings and life eXam intercepts. These debates involve stakeholders in adopting HPO-enhanced clinical pathways through Alliances for Health Systems Networking-Genomic Enterprise Partnerships (AHSN-GEP).

CONCLUSION: "The Generation Study" represents a paradigm in digital child health management using an HPO-X-Gen-AI framework, transitioning from trusted research to evidence-based discovery. This approach sets a standard for personalised healthcare practices, incorporating ontology risk stratification and future-ready analytics as outlined in the NHS Constitution. The discourse on higher expert medical science safety governance will continue in the forthcoming manuscript, "PHM Fit Lifecycles in Future Analytics," which will further explore developing localised health solutions for "Our Future Health."

RevDate: 2025-08-16

Qian Y, Zhou Z, Ouyang T, et al (2025)

Pangenome analysis of transposable element insertion polymorphisms reveals features underlying cold tolerance in rice.

Nature communications, 16(1):7634.

Transposable elements (TEs) introduce genetic and epigenetic variability, contributing to gene expression patterns that drive adaptive evolution in plants. Here, we investigate TE architecture and its effect on cold tolerance in rice. By analyzing a pangenome graph and the resequencing data of 165 rice accessions, we identify 30,316 transposable element insertion polymorphism (TIP) sites, highlighting significant diversity among polymorphic TEs (pTEs). We observe that pTEs exhibit increased H3K27me3 enrichment, suggesting a potential role in epigenetic differentiation under cold stress and in the transcriptional regulation of the cold response. We identify 26,914 TEs responsive to cold stress from transcriptome data, indicating their potential significance in regulatory networks for this response. Our TIP-GWAS analysis reveal two cold tolerance genes OsCACT and OsPTR. The biological functions of these genes are confirmed using knockout and overexpression lines. Our web tool (https://cbi.gxu.edu.cn/RICEPTEDB/) makes all pTEs available to researchers for further analysis. These findings provide valuable targets for breeding cold-tolerant rice varieties, indicating the potential importance of pTEs in crop enhancement.

RevDate: 2025-08-16

Li Y, Huang Z, Zhu X, et al (2025)

Serotyping, molecular typing, and vaccine protein screening for Riemerella anatipestifer: Overcoming challenges in prevention and treatment.

Veterinary microbiology, 309:110663 pii:S0378-1135(25)00298-6 [Epub ahead of print].

Riemerella anatipestifer (R. anatipestifer) affects the duck farming industry worldwide, causing substantial economic losses. The current disease prevention and treatment strategies primarily include vaccines and antibiotics. However, the large number of serotypes and increasing resistance to R. anatipestifer make it challenging to prevent and treat the infection. This study carried out the serotyping and molecular typing of 51 R. anatipestifer strains and predicted vaccine proteins based on pan-genome analysis and cross-immune protection potential. For serotype identification, the rabbits were immunized with antigens, and 9 serotyped sera were prepared, the data revealed 6 serotypes with two unformed strains. The results for the self-made serotypes were consistent with those obtained from the externally submitted strains. Moreover, the pan-genome analysis was performed on 51 R. anatipestifer strains, and an open pan-genome set of 5094 genes was constructed. In addition, the COG annotation classification indicated that the core and non-core genomes were significantly different in gene functions. A total of 1116 core genomes that could serve as better cross-protective vaccine proteins were analyzed and revealed 5 genes of interest. In addition, the oprM-1 protein, a highly reactive protein, was expressed and purified, and the immunoreactivity with five antisera (anti-serotypes 1, 2, 5, 11, and 18) was demonstrated by Western blotting. This study fills the gaps in the existing typing systems for R. anatipestifer by combining serotyping, MLST typing, and pan-genome analysis. Furthermore, it provides valuable insights into the epidemiology, evolution, and pathogenesis of R. anatipestifer and paves the way for developing effective cross-protective vaccines.

RevDate: 2025-08-15

Hatmaker EA, Barber AE, Drott MT, et al (2025)

Population structure in a fungal human pathogen is potentially linked to pathogenicity.

Nature communications, 16(1):7594.

Aspergillus flavus is a clinically and agriculturally important saprotrophic fungus responsible for severe human infections and extensive crop losses. Here, we analyze genomic data from 300 (117 clinical and 183 environmental) A. flavus isolates from 13 countries, including 82 clinical isolates sequenced in this study, to examine population and pan-genome structure and their relationship to pathogenicity. We use single nucleotide polymorphisms to build a phylogeny, analyze admixture, and perform discriminant analysis of principal components. We identify five A. flavus populations, including a new population, D, corresponding to distinct clades in the genome-wide phylogeny. Strikingly, > 75% of clinical isolates were in population D and <5% in population B. We also use orthogroup clustering to identify core and accessory genes within the pan-genome. Accessory genes, including genes within biosynthetic gene clusters, were significantly more common in some populations but rare in others. Our functional annotations show that population D is enriched for genes associated with carbohydrate metabolism, lipid metabolism and certain types of hydrolase activity, whereas a non-clinical population is depleted in genes related to zinc ion binding. In contrast to previous results from the major human pathogen Aspergillus fumigatus, isolation of A. flavus from human specimens is associated with population structure, providing a promising system for future investigations into the contributions of population-specific genetic differences to human infection.

RevDate: 2025-08-14

Teasdale LC, Murray KD, Collenberg M, et al (2025)

Pangenomic context reveals the extent of intraspecific plant NLR evolution.

Cell host & microbe, 33(8):1291-1305.e9.

Nucleotide-binding leucine-rich repeat (NLR) proteins are major components of the plant immune system, recognizing pathogen effectors and triggering defense responses. Because of the diversity of pathogen effector repertoires, NLRs have extraordinary sequence, structural, and regulatory variability. Although processes contributing to NLR diversity have been identified, the precise evolution of NLRs in their genomic context and along the multiple axes of diversity has been difficult to trace. We integrate genome-specific full-length transcript, homology, and transposable element information to annotate 3,789 NLRs in 17 diverse Arabidopsis thaliana accessions. We define 121 pangenomic NLR neighborhoods, which vary greatly in size, content, and complexity. NLRs are diverse across many axes, and multiple metrics are required to fully capture NLR variation. Based on these findings, we propose that diversity in diversity generation is fundamental to maintaining a functionally "adaptive" immune system in plants and that mechanistic studies should consider multiple axes of immune system diversity.

RevDate: 2025-08-15

Li M, Wu Y, Li H, et al (2025)

Comparative pan-genome analysis of Huperzia and Phlegmariurus and transcriptomics reveals thermal adaptation in Huperzia.

Functional & integrative genomics, 25(1):168.

Huperzia and Phlegmariurus are ancient genera within the Lycopodiaceae family with significant medicinal value and ecological adaptability, yet the evolutionary dynamics and genetic diversity of their chloroplast genomes remain poorly characterized. Specifically, critical aspects such as intergeneric differences, phylogenetic relationships, and adaptive evolution within their chloroplast genomes remain insufficiently explored. This study analyzed the chloroplast genomes of 66 species from these two genera through comparative genomics to elucidate their structural dynamics and adaptive mechanisms. Results revealed that Huperzia chloroplast genomes (153-155 kb, GC content 36.25-36.39%) are significantly larger than those of Phlegmariurus (148-151 kb, GC content 33.78-34.26%), with pronounced differences in IR boundary dynamics, repetitive sequence distribution, nucleotide diversity, and codon usage bias. Phylogenetic and population structure analyses confirmed the monophyly of both genera and demonstrated significantly higher genetic diversity in Phlegmariurus, likely linked to adaptive radiation driven by humid tropical environments. Transcriptomic data revealed a temporally coordinated chloroplast response to heat stress in Huperzia serrata. Photosynthetic core genes (such as psaB and rrna16) were downregulated, leading to sustained functional impairment. In contrast, early stress-response genes (such as rbcL and trnI-GAU) peaked at 4 h to enhance carbon fixation and transport. Mid-phase repair genes (such as ndhG and rps8) exhibited inverted U-shaped expression patterns to activate electron transport and protein synthesis, whereas late-stage overexpression of atpI restored energy homeostasis. This coordinated regulatory mechanism illustrates a survival strategy of "photosynthetic inhibition-stress compensation-energy reorganization" for thermal adaptation. Future studies should integrate nuclear genome and epigenetic modification data to further unravel the synergistic nucleo-cytoplasmic interactions underlying environmental adaptation.

RevDate: 2025-08-16

Ahmad B, Su Y, Hao Y, et al (2025)

Mango pangenome reveals dramatic impacts of reference bias on population genomic analyses.

Horticulture research, 12(9):uhaf166.

Most genomic studies start by mapping sequencing data to a reference genome. The quality of reference genome assembly, genetic relatedness to the studied population, and the mapping method employed directly impact variant calling accuracy and subsequent genomic analyses, introducing reference bias and resulting in erroneous conclusions. However, the impacts of reference bias have gained limited attention. This study compared population genomic analyses using four different reference genomes of mango (Mangifera indica), including the two haploid assemblies of haplotype-resolved telomere-to-telomere (T2T) genome assembly, a pangenome, and an older version of the reference genome available on NCBI. The choice of reference genome dramatically impacted the mapping efficiency and resulted in notable differences in calling the genetic variants, particularly structural variations (SVs). Phylogenetic analysis was more sensitive to the reference genome compared to genetic differentiation. Population genomic analyses of artificial selection in domestication and SV hotspot regions varied across reference genomes. Notably, the gene enrichment analyses showed significant differences in the top enriched biological processes depending on the reference genome used. Overall, the mango pangenome outperformed the other reference genomes across various metrics, followed by T2T reference genomes, as they captured greater diversity and effectively reduced reference bias. Our findings highlight the role of the mango pangenome in reducing reference bias and underscore the critical role of reference genome selection, suggesting that it is one of the most important factors in population genomic studies.

RevDate: 2025-08-13

Li H (2025)

Finding easy regions for short-read variant calling from pangenome data.

ArXiv pii:2507.03718.

BACKGROUND: While benchmarks on short-read variant calling suggest low error rate below 0.5%, they are only applicable to predefined confident regions. For a human sample without such regions, the error rate could be 10 times higher. Although multiple sets of easy regions have been identified to alleviate the issue, they fail to consider non-reference samples or are biased towards existing short-read data or aligners.

RESULTS: Here, using hundreds of high-quality human assemblies, we derived a set of sample-agnostic easy regions where short-read variant calling reaches high accuracy. These regions cover 88.2% of GRCh38, 92.2% of coding regions and 96.3% of ClinVar pathogenic variants. They achieve a good balance between coverage and easiness and can be generated for other human assemblies or species with multiple well assembled genomes.

CONCLUSION: This resource provides a convient and powerful way to filter spurious variant calls for clinical or research human samples.

RevDate: 2025-08-13

Salehi Nowbandegani P, Zhang S, Hu H, et al (2025)

Defining and cataloging variants in pangenome graphs.

bioRxiv : the preprint server for biology pii:2025.08.04.668502.

Structural variation causes some human haplotypes to align poorly with the linear reference genome, leading to 'reference bias'. A pangenome reference graph could ameliorate this bias by relating a sample to multiple reference assemblies. However, this approach requires a new definition of a 'genetic variant.' We introduce a definition of pangenome variants and a method, pantree , to identify them. Our approach involves a pangenome reference tree which includes all nodes (sequences) of the pangenome graph, but only a subset of its edges; non-reference edges are variant edges . Our variants are biallelic and have well-defined positions. Analyzing the Minigraph-Cactus draft human pangenome reference graph, we identified 29.6 million genetic variants. Most variants (99.2%) are small, and most small variants (73.9%) are SNPs. 3.5 million variants (11.7%) have a reference allele which is not on GRCh38; these variants are difficult to detect without a pangenome reference, or with existing pangenome-based approaches. They tend to be embedded within tangled, multiallelic regions. We analyze two medically relevant regions, around the HLA-A and RHD genes, identifying thousands of small variants embedded within several large insertions, deletions, and inversions. We release an open-source software tool together with a VCF variant catalogue.

RevDate: 2025-08-16

Depuydt L, Renders L, Van de Vyver S, et al (2025)

b-move: faster lossless approximate pattern matching in a run-length compressed index.

Algorithms for molecular biology : AMB, 20(1):15.

BACKGROUND: Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.'s r-index and Nishimoto and Tabei's move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.'s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns.

RESULTS: We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient, lossless approximate pattern matching in run-length compressed space. It achieves bidirectional character extensions up to 7 times faster than the br-index, closing the performance gap with FM-index-based alternatives. For locating occurrences, b-move performs ϕ and ϕ - 1 operations up to 7 times faster than the br-index. At the same time, it maintains the favorable memory characteristics of the br-index, for example, all available complete E. coli genomes on NCBI's RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop.

CONCLUSIONS: b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.

RevDate: 2025-08-16

Kulmanov M, Ashouri S, Liu Y, et al (2025)

Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia.

Scientific data, 12(1):1316.

The selection of a reference sequence in genome analysis is critical, as it serves as the foundation for all downstream analyses. Recently, the pangenome graph has been proposed as a data model that incorporates haplotypes from multiple individuals. Here we present JaSaPaGe, a pangenome graph reference for Saudi Arabian and Japanese populations, both of which have been significantly underrepresented in previous genomic studies. We constructed JaSaPaGe from high-quality phased diploid assemblies which were made utilizing PacBio high-fidelity long reads, Nanopore long reads, and Hi-C short reads of 9 Saudi and 10 Japanese individuals. Quality evaluation of the pangenome graph by variant calling showed that our pangenome outperformed earlier linear reference genomes (GRCh38 and T2T-CHM13) and showed comparable performance to the pangenome graph provided by the Human Pangenome Reference Consortium (HPRC), with more variants found in Japanese and Saudi samples using their population-specific pangenomes. This pangenome reference will serve as a valuable resource for both the research and clinical communities in Japan and Saudi Arabia.

RevDate: 2025-08-12

Qing Y, Liao Z, An D, et al (2025)

Comparative genomics reveals the genetic diversity and plasticity of Clostridium tertium.

Journal of applied microbiology pii:8232670 [Epub ahead of print].

AIMS: Clostridium tertium, increasingly recognized as the emerging human pathogen frequently isolated from environmental and clinical specimens, remains genetically underexplored despite its clinical relevance. This study aims to explore the genetic characteristics of C. tertium by genomic analysis.

METHODS AND RESULTS: This study presented a comprehensive genomic investigation of 45 C. tertium strains from the GenBank database. Genome sizes (3.27-4.55 Mbp) and coding gene counts varied markedly across strains. Phylogenetic analyses based on 16S rRNA gene and core genome uncovered distinct intra-species lineages, including evolutionarily divergent clusters likely shaped by niche specialization. Pan-genomic analysis confirmed an open genome, with accessory and strain-specific genes enriched in functions related to environmental adaptation and regulation. Functional annotation further identified diverse virulence factor genes (e.g. clpP, nagK) and antibiotic resistance genes (e.g. vatB, tetA(P)) co-occurring with mobile genetic elements (MGEs), suggesting that horizontal gene transfer (HGT) may be a key driver of genome plasticity in C. tertium. Notably, one-third of the strains carried CRISPR-Cas systems, indicating the defense potential against exogenous genetic elements.

CONCLUSIONS: C. tertium exhibited extensive genetic diversity and genome plasticity, probably driven by MGE-mediated HGT, defense mechanisms of CRISPR-Cas systems, and functional adaptation related to virulence and resistance. These traits may underlie its ability to colonize diverse environments and acquire pathogenicity and resistance.

RevDate: 2025-08-14
CmpDate: 2025-08-12

Anderson OH, Chong JPJ, GH Thomas (2025)

Comparative genomics of Clostridium butyricum reveals a conserved genome architecture and novel virulence-related gene clusters.

Microbial genomics, 11(8):.

Bacteria from the species Clostridium butyricum encompass a diverse range of phenotypes. While some strains are used as probiotics, others have been isolated from cases of botulism and necrotizing enterocolitis (NEC) in preterm neonates. We identify a unique genomic feature of this species, namely a highly conserved extrachromosomal element of ~0.8 Mb. This replicon satisfies the three principal criteria used to define a chromid, which include the possession of core genes that are encoded on the main chromosome in other species. Although C. butyricum is the type species of Clostridium, we find that the possession of a chromid is not a typical feature of members of this genus and represents a unique genomic fingerprint of the species C. butyricum. Furthermore, we show that pathogenic C. butyricum strains from the sequenced examples are not monophyletic, which suggests that virulence has evolved multiple times from related non-pathogenic ancestors. However, we were able to identify common genes which are found exclusively in these pathogenic strains. In addition to the botulinum neurotoxin genes, these include a novel set of genes involved in the biosynthesis of a capsular polysaccharide (CPS), and genes that confer the ability to utilize the mucin-derived sugar l-fucose, which may provide a competitive advantage for growth in the colon. Moreover, by identifying NEC strain-associated virulence factors, we are able to further the understanding of these particularly harmful strains.

RevDate: 2025-08-16
CmpDate: 2025-08-12

Zou Y, Zhu W, Hou Y, et al (2025)

The evolutionary dynamics of organellar pan-genomes in Arabidopsis thaliana.

Genome biology, 26(1):240.

BACKGROUND: In plants, comparative analyses of organellar genomes are often based on draft assemblies. Large-scale investigations into the complex structural rearrangements of mitochondrial genomes remain scarce.

RESULTS: Here, we perform a comprehensive analysis of the dominant conformations and dynamic heteroplasmic variants of organellar genomes in the model plant Arabidopsis thaliana, utilizing high-quality long-read assemblies validated at high resolution from 149 samples. We find that mitochondrial and plastid genomes share common types of structural and small-scale variants driven by similar DNA sequence features. However, rearrangements mediated by repetitive sequences in mitochondrial genomes evolve so rapidly that they are often decoupled from other types of variants. Rare complex events involving elongation and fusion of existing repeats are also observed, contributing to the unalignable regions commonly found at the interspecies level. Additionally, we demonstrate that disrupting and rescuing organellar DNA maintenance could drive the rapid evolution of dominant mitochondrial genome conformations.

CONCLUSIONS: Our study provides an unprecedentedly detailed view of the dynamics of organellar genomes at pan-genome scale in Arabidopsis thaliana, paving the way to unlock the full potential of organellar genetic resources.

RevDate: 2025-08-11

Engelhorn J, Snodgrass SJ, Kok A, et al (2025)

Genetic variation at transcription factor binding sites largely explains phenotypic heritability in maize.

Nature genetics [Epub ahead of print].

Comprehensive maps of functional variation at transcription factor (TF) binding sites (cis-elements) are crucial for elucidating how genotype shapes phenotype. Here, we report the construction of a pan-cistrome of the maize leaf under well-watered and drought conditions. We quantified haplotype-specific TF footprints across a pan-genome of 25 maize hybrids and mapped over 200,000 variants, genetic, epigenetic, or both (termed binding quantitative trait loci (bQTL)), linked to cis-element occupancy. Three lines of evidence support the functional significance of bQTL: (1) coincidence with causative loci that regulate traits, including vgt1, ZmTRE1 and the MITE transposon near ZmNAC111 under drought; (2) bQTL allelic bias is shared between inbred parents and matches chromatin immunoprecipitation sequencing results; and (3) partitioning genetic variation across genomic regions demonstrates that bQTL capture the majority of heritable trait variation across ~72% of 143 phenotypes. Our study provides an auspicious approach to make functional cis-variation accessible at scale for genetic studies and targeted engineering of complex traits.

RevDate: 2025-08-11
CmpDate: 2025-08-11

Tellatin D, Cornet L, Snauwaert V, et al (2025)

Melissospora conviva gen. nov., sp. nov., a novel actinobacterial genus isolated from beehive through cross-feeding interactions.

International journal of systematic and evolutionary microbiology, 75(8):.

Most micro-organisms remain unculturable under standard laboratory conditions, limiting our understanding of microbial diversity and ecological interactions. One major cause of this uncultivability is the loss of access to essential cross-fed metabolites when bacteria are removed from their natural communities. During a bioprospecting campaign targeting actinomycetes of an Apis mellifera beehive, we identified five isolates (DT32, DT45[T], DT55, DT59 and DT194) that required co-cultivation for growth recovery, suggesting a dependence on microbial interactions in their native habitat. Whole-genome sequencing and phylogenetic analysis positioned these isolates within a distinct lineage of Micromonosporaceae, separate from the five officially recognized clades of the Micromonospora genus. A combination of microscopic, chemotaxonomic and physiological characterizations further supported their uniqueness. Notably, they exhibited high auxotrophy, being unable to use all carbon sources tested, likely due to genome reduction (4.6 Mbp) compared to other Micromonosporaceae. Pangenomic comparisons with their closest Micromonospora relatives revealed gene losses in key metabolic pathways, including the glyoxylate bypass and the Entner-Doudoroff pathway, which may explain their metabolic reliance. These findings reveal a highly specialized, ecologically adapted lineage with deep evolutionary divergence and further support microbial interdependence isolation strategies to explore the microbial dark matter. We propose Melissospora conviva as a novel genus and species within the Actinomycetota phylum, with isolate DT45[T] as the representative type species and type strain, which has been deposited in public collections under the accession numbers DSM 117791 and LMG 33580.

LOAD NEXT 100 CITATIONS

RJR Experience and Expertise

Researcher

Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.

Administrator

Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.

Technologist

Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.

Facilitator

Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

In the mid-1970s, scientists began using DNA sequences to reexamine the history of all life. Perhaps the most startling discovery to come out of this new field—the study of life’s diversity and relatedness at the molecular level—is horizontal gene transfer (HGT), or the movement of genes across species lines. It turns out that HGT has been widespread and important; we now know that roughly eight percent of the human genome arrived sideways by viral infection—a type of HGT. In The Tangled Tree, “the grandest tale in biology….David Quammen presents the science—and the scientists involved—with patience, candor, and flair” (Nature). We learn about the major players, such as Carl Woese, the most important little-known biologist of the twentieth century; Lynn Margulis, the notorious maverick whose wild ideas about “mosaic” creatures proved to be true; and Tsutomu Wantanabe, who discovered that the scourge of antibiotic-resistant bacteria is a direct result of horizontal gene transfer, bringing the deep study of genome histories to bear on a global crisis in public health.

963 Red Tail Lane
Bellingham, WA 98226

206-300-3443

E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )