picture
RJR-logo

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

icon

Bibliography Options Menu

icon
QUERY RUN:
28 Sep 2021 at 01:32
HITS:
1511
PAGE OPTIONS:
Hide Abstracts   |   Hide Additional Links
NOTE:
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Pangenome

RJR-3x

Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 28 Sep 2021 at 01:32 Created: 

Pangenome

Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: pangenome or "pan-genome" or "pan genome" NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

-->

RevDate: 2021-09-27

Tognon M, Bonnici V, Garrison E, et al (2021)

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.

PLoS computational biology, 17(9):e1009444 pii:PCOMPBIOL-D-21-00041 [Epub ahead of print].

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.

RevDate: 2021-09-27

Kim MS, Lee T, Baek J, et al (2021)

Genome assembly of the popular Korean soybean cultivar Hwangkeum.

G3 (Bethesda, Md.), 11(10):.

Massive resequencing efforts have been undertaken to catalog allelic variants in major crop species including soybean, but the scope of the information for genetic variation often depends on short sequence reads mapped to the extant reference genome. Additional de novo assembled genome sequences provide a unique opportunity to explore a dispensable genome fraction in the pan-genome of a species. Here, we report the de novo assembly and annotation of Hwangkeum, a popular soybean cultivar in Korea. The assembly was constructed using PromethION nanopore sequencing data and two genetic maps and was then error-corrected using Illumina short-reads and PacBio SMRT reads. The 933.12 Mb assembly was annotated as containing 79,870 transcripts for 58,550 genes using RNA-Seq data and the public soybean annotation set. Comparison of the Hwangkeum assembly with the Williams 82 soybean reference genome sequence (Wm82.a2.v1) revealed 1.8 million single-nucleotide polymorphisms, 0.5 million indels, and 25 thousand putative structural variants. However, there was no natural megabase-scale chromosomal rearrangement. Incidentally, by adding two novel subfamilies, we found that soybean contains four clearly separated subfamilies of centromeric satellite repeats. Analyses of satellite repeats and gene content suggested that the Hwangkeum assembly is a high-quality assembly. This was further supported by comparison of the marker arrangement of anthocyanin biosynthesis genes and of gene arrangement at the Rsv3 locus. Therefore, the results indicate that the de novo assembly of Hwangkeum is a valuable additional reference genome resource for characterizing traits for the improvement of this important crop species.

RevDate: 2021-09-27

Sato K, Mascher M, Himmelbach A, et al (2021)

Chromosome-scale assembly of wild barley accession "OUH602".

G3 (Bethesda, Md.), 11(10):.

Barley (Hordeum vulgare) was domesticated from its wild ancestral form ca. 10,000 years ago in the Fertile Crescent and is widely cultivated throughout the world, except for in tropical areas. The genome size of both cultivated barley and its conspecific wild ancestor is approximately 5 Gb. High-quality chromosome-level assemblies of 19 cultivated and one wild barley genotype were recently established by pan-genome analysis. Here, we release another equivalent short-read assembly of the wild barley accession "OUH602." A series of genetic and genomic resources were developed for this genotype in prior studies. Our assembly contains more than 4.4 Gb of sequence, with a scaffold N50 value of over 10 Mb. The haplotype shows high collinearity with the most recently updated barley reference genome, "Morex" V3, with some inversions. Gene projections based on "Morex" gene models revealed 46,807 protein-coding sequences and 43,375 protein-coding genes. Alignments to publicly available sequences of bacterial artificial chromosome (BAC) clones of "OUH602" confirm the high accuracy of the assembly. Since more loci of interest have been identified in "OUH602," the release of this assembly, with detailed genomic information, should accelerate gene identification and the utilization of this key wild barley accession.

RevDate: 2021-09-27

Mahtha SK, Purama RK, G Yadav (2021)

StAR-Related Lipid Transfer (START) Domains Across the Rice Pangenome Reveal How Ontogeny Recapitulated Selection Pressures During Rice Domestication.

Frontiers in genetics, 12:737194.

The StAR-related lipid transfer (START) domain containing proteins or START proteins, encoded by a plant amplified family of evolutionary conserved genes, play important roles in lipid binding, transport, signaling, and modulation of transcriptional activity in the plant kingdom, but there is limited information on their evolution, duplication, and associated sub- or neo-functionalization. Here we perform a comprehensive investigation of this family across the rice pangenome, using 10 wild and cultivated varieties. Conservation of START domains across all 10 rice genomes suggests low dispensability and critical functional roles for this family, further supported by chromosomal mapping, duplication and domain structure patterns. Analysis of synteny highlights a preponderance of segmental and dispersed duplication among STARTs, while transcriptomic investigation of the main cultivated variety Oryza sativa var. japonica reveals sub-functionalization amongst genes family members in terms of preferential expression across various developmental stages and anatomical parts, such as flowering. Ka/Ks ratios confirmed strong negative/purifying selection on START family evolution, implying that ontogeny recapitulated selection pressures during rice domestication. Our findings provide evidence for high conservation of START genes across rice varieties in numbers, as well as in their stringent regulation of Ka/Ks ratio, and showed strong functional dependency of plants on START proteins for their growth and reproductive development. We believe that our findings advance the limited knowledge about plant START domain diversity and evolution, and pave the way for more detailed assessment of individual structural classes of START proteins among plants and their domain specific substrate preferences, to complement existing studies in animals and yeast.

RevDate: 2021-09-27

Jaakkola K, Virtanen K, Lahti P, et al (2021)

Comparative Genome Analysis and Spore Heat Resistance Assay Reveal a New Component to Population Structure and Genome Epidemiology Within Clostridium perfringens Enterotoxin-Carrying Isolates.

Frontiers in microbiology, 12:717176.

Clostridium perfringens causes a variety of human and animal enteric diseases including food poisoning, antibiotic-associated diarrhea, and necrotic enteritis. Yet, the reservoirs of enteropathogenic enterotoxin-producing strains remain unknown. We conducted a genomic comparison of 290 strains and a heat resistance phenotyping of 30 C. perfringens strains to elucidate the population structure and ecology of this pathogen. C. perfringens genomes shared a conserved genetic backbone with more than half of the genes of an average genome conserved in >95% of strains. The cpe-carrying isolates were found to share genetic context: the cpe-carrying plasmids had different distribution patterns within the genetic lineages and the estimated pan genome of cpe-carrying isolates had a larger core genome and a smaller accessory genome compared to that of 290 strains. We characterize cpe-negative strains related to chromosomal cpe-carrying strains elucidating the origin of these strains and disclose two distinct groups of chromosomal cpe-carrying strains with different virulence characteristics, spore heat resistance properties, and, presumably, ecological niche. Finally, an antibiotic-associated diarrhea isolate carrying two copies of the enterotoxin cpe gene and the associated genetic lineage with the potential for the emergence of similar strains are outlined. With C. perfringens as an example, implications of input genome quality for pan genome analysis are discussed. Our study furthers the understanding of genome epidemiology and population structure of enteropathogenic C. perfringens and brings new insight into this important pathogen and its reservoirs.

RevDate: 2021-09-27

Wekesa CS, Furch ACU, R Oelmüller (2021)

Isolation and Characterization of High-Efficiency Rhizobia From Western Kenya Nodulating With Common Bean.

Frontiers in microbiology, 12:697567.

Common bean is one of the primary protein sources in third-world countries. They form nodules with nitrogen-fixing rhizobia, which have to be adapted to the local soils. Commercial rhizobial strains such as Rhizobium tropici CIAT899 are often used in agriculture. However, this strain failed to significantly increase the common bean yield in many places, including Kenya, due to the local soils' low pH. We isolated two indigenous rhizobial strains from the nodules of common bean from two fields in Western Kenya that have never been exposed to commercial inocula. We then determined their ability to fix nitrogen in common beans, solubilize phosphorus, and produce indole acetic acid. In greenhouse experiments, common bean plants inoculated with two isolates, B3 and S2 in sterile vermiculite, performed better than those inoculated with CIAT899 or plants grown with nitrogen fertilizer alone. In contrast to CIAT899, both isolates grew in the media with pH 4.8. Furthermore, isolate B3 had higher phosphate solubilization ability and produced more indole acetic acid than the other two rhizobia. Genome analyses revealed that B3 and S2 are different strains of Rhizobium phaseoli. We recommend fieldwork studies in Kenyan soils to test the efficacy of the two isolates in the natural environment in an effort to produce inoculants specific for these soils.

RevDate: 2021-09-26

Vela Gurovic MS, Díaz ML, Gallo CA, et al (2021)

Phylogenomics, CAZyome and core secondary metabolome of Streptomyces albus species.

Molecular genetics and genomics : MGG [Epub ahead of print].

A phylogenomic study conducted with different bioinformatic tools such as TYGS, REALPHY and AAI comparisons revealed a high rate of misidentified Streptomyces albus genomes in GenBank. Only 9 of the 18 annotated genomes available in the public database were correctly identified as S. albus species. The pangenome of the nine in silico confirmed S. albus genomes was almost closed. Lignocellulosic agroresidues were a common niche among strains of the S. albus clade while carbohydrate active enzymes (CAZymes) were highly conserved. Relevant enzymes for cellulose degradation such as beta glucosidases belonging to the GH1 family, a GH6 cellulase and a monooxygenase AA10-CBM2 were encoded by all S. albus genomes. Among them, one GH1 glycosidase would be regulated by CebR. However, this regulatory mechanism was not confirmed for other genes related to cellulose degradation. Based on AntiSMASH predictions, the core secondary metabolome of S. albus encompassed a total of 23 biosynthetic gene clusters (BGCs), where 4 were related to common metabolites within Streptomyces genus. Species specific BGCs included those related to pseudouridimycin and xantholipin. Additionally, four BGCs encoded putative derivatives of ibomycin, the lasso peptide SSV-2086, the lanthipeptide SapB and the terpene isorenieratene. Known metabolites could not be assigned to ten BGCs and three clusters did not match with any previously described BGC. The core genome of S. albus retrieved from nine closely related genomes revealed a high potential for the discovery of novel bioactive metabolites and underexplored regulatory genomic elements related to lignocellulose deconstruction.

RevDate: 2021-09-25

Contreras-Moreira B, Filippi CV, Naamati G, et al (2021)

K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes.

The plant genome [Epub ahead of print].

The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k-mer-based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red-masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.

RevDate: 2021-09-24

Horesh G, Taylor-Brown A, McGimpsey S, et al (2021)

Different evolutionary trends form the twilight zone of the bacterial pan-genome.

Microbial genomics, 7(9):.

The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species' pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 Escherichia coli genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species' pan-genome.

RevDate: 2021-09-21

Flores Ramos S, Brugger SD, Escapa IF, et al (2021)

Genomic Stability and Genetic Defense Systems in Dolosigranulum pigrum, a Candidate Beneficial Bacterium from the Human Microbiome.

mSystems [Epub ahead of print].

Dolosigranulum pigrum is positively associated with indicators of health in multiple epidemiological studies of human nasal microbiota. Knowledge of the basic biology of D. pigrum is a prerequisite for evaluating its potential for future therapeutic use; however, such data are very limited. To gain insight into D. pigrum's chromosomal structure, pangenome, and genomic stability, we compared the genomes of 28 D. pigrum strains that were collected across 20 years. Phylogenomic analysis showed closely related strains circulating over this period and closure of 19 genomes revealed highly conserved chromosomal synteny. Gene clusters involved in the mobilome and in defense against mobile genetic elements (MGEs) were enriched in the accessory genome versus the core genome. A systematic analysis for MGEs identified the first candidate D. pigrum prophage and insertion sequence. A systematic analysis for genetic elements that limit the spread of MGEs, including restriction modification (RM), CRISPR-Cas, and deity-named defense systems, revealed strain-level diversity in host defense systems that localized to specific genomic sites, including one RM system hot spot. Analysis of CRISPR spacers pointed to a wealth of MGEs against which D. pigrum defends itself. These results reveal a role for horizontal gene transfer and mobile genetic elements in strain diversification while highlighting that in D. pigrum this occurs within the context of a highly stable chromosomal organization protected by a variety of defense mechanisms. IMPORTANCE Dolosigranulum pigrum is a candidate beneficial bacterium with potential for future therapeutic use. This is based on its positive associations with characteristics of health in multiple studies of human nasal microbiota across the span of human life. For example, high levels of D. pigrum nasal colonization in adults predicts the absence of Staphylococcus aureus nasal colonization. Also, D. pigrum nasal colonization in young children is associated with healthy control groups in studies of middle ear infections. Our analysis of 28 genomes revealed a remarkable stability of D. pigrum strains colonizing people in the United States across a 20-year span. We subsequently identified factors that can influence this stability, including genomic stability, phage predators, the role of MGEs in strain-level variation, and defenses against MGEs. Finally, these D. pigrum strains also lacked predicted virulence factors. Overall, these findings add additional support to the potential for D. pigrum as a therapeutic bacterium.

RevDate: 2021-09-20

Sonnenberg CB, P Haugen (2021)

The Pseudoalteromonas multipartite genome: distribution and expression of pangene categories, and a hypothesis for the origin and evolution of the chromid.

G3 (Bethesda, Md.), 11(9):.

Bacterial genomes typically consist of one large chromosome, but can also include secondary replicons. These so-called multipartite genomes are scattered on the bacterial tree of life with the majority of cases belonging to Proteobacteria. Within the class gamma-proteobacteria, multipartite genomes are restricted to the two families Vibrionaceae and Pseudoalteromonadaceae. Whereas the genome of vibrios is well studied, information on the Pseudoalteromonadaceae genome is much scarcer. We have studied Pseudoalteromonadaceae with respect to the origin of the chromid, how pangene categories are distributed, how genes are expressed relative to their genomic location, and identified chromid hallmark genes. We calculated the Pseudoalteromonadaceae pangenome based on 25 complete genomes and found that core/softcore are significantly overrepresented in late replicating sectors of the chromid, regardless of how the chromid is replicated. On the chromosome, core/softcore and shell/cloud genes are only weakly overrepresented at the chromosomal replication origin and termination sequences, respectively. Gene expression is trending downwards with increasing distance from the chromosomal oriC, whereas the chromidal expression pattern is more complex. Moreover, we identified 78 chromid hallmark genes, and BLASTp searches suggest that the majority of them were acquired from the ancestral gene pool of Alteromonadales. Finally, our data strongly suggest that the chromid originates from a plasmid that was acquired in a relatively recent event. In summary, this study extends our knowledge on multipartite genomes, and helps us understand how and why secondary replicons are acquired, why they are maintained, and how they are shaped by evolution.

RevDate: 2021-09-20

Zou W, Ye G, Liu C, et al (2021)

Comparative genome analysis of Clostridium beijerinckii strains isolated from pit mud of Chinese strong flavor baijiu ecosystem.

G3 (Bethesda, Md.) pii:6364901 [Epub ahead of print].

Clostridium beijerinckii is a well-known anaerobic solventogenic bacterium which inhabits a wide range of different niches. Previously, we isolated five butyrate-producing C. beijerinckii strains from pit mud (PM) of strong-flavor baijiu (SFB) ecosystems. Genome annotation of the five strains showed that they could assimilate various carbon sources as well as ammonium to produce acetate, butyrate, lactate, hydrogen, and esters but did not produce the undesirable flavors isopropanol and acetone, making them useful for further exploration in SFB production. Our analysis of the genomes of an additional 233 C. beijerinckii strains revealed an open pangenome based on current sampling and will likely change with additional genomes. The core genome, accessory genome, and strain-specific genes comprised 1567, 8851, and 2154 genes, respectively. A total of 298 genes were found only in the five C. beijerinckii strains from PM, among which only 77 genes were assigned to Clusters of Orthologous Genes categories. In addition, 15 transposase and 12 phage integrase families were found in all five C. beijerinckii strains from PM. Between 18 and 21 genome islands were predicted for the five C. beijerinckii genomes. The existence of a large number of mobile genetic elements indicated that the genomes of the five C. beijerinckii strains evolved with the loss or insertion of DNA fragments in the PM of SFB ecosystems. This study presents a genomic framework of C. beijerinckii strains from PM that could be used for genetic diversification studies and further exploration of these strains.

RevDate: 2021-09-17

Bansal K, Kaur A, Midha S, et al (2021)

Xanthomonas sontii sp. nov., a non-pathogenic bacterium isolated from healthy basmati rice (Oryza sativa) seeds from India.

Antonie van Leeuwenhoek [Epub ahead of print].

We report three yellow-pigmented, Gram-negative, aerobic, rod-shaped, motile bacterial isolates designated as PPL1T, PPL2, and PPL3 from healthy basmati rice seeds. Phenotypic and 16S rRNA gene sequence analysis assigned these isolates to the genus Xanthomonas. The 16S rRNA showed a 99.59% similarity with X. sacchari CFBP 4641T, a sugarcane pathogen. Further, biochemical and fatty acid analysis revealed it to be closer to X. sacchari. Still, it differed from other species in general and known rice associated species such as X. oryzae (pathogenic) and X. maliensis (non-pathogenic) in particular. Interestingly, the isolatess in this study were isolated from healthy rice plants but are closely related to species that is pathogenic and isolated from diseased sugarcane. Accordingly, in planta studies revealed that PPL1T, PPL2, and PPL3 are non-pathogenic to rice plants upon leaf inoculation. Taxonogenomic studies based on orthologous average nucleotide identity (OrthoANI) and digital DNA-DNA hybridization (dDDH) values with type strains of Xanthomonas species were below the recommended threshold values for species delineation. Whole genome-based phylogenomic analysis revealed that these isolates formed a distinct monophyletic clade with X. sacchari CFBP 4641T as their closest neighbour. Further, pangenome analysis revealed PPL1T, PPL2, and PPL3 isolates to comprise NRPS cluster along with a large number of unique genes associated with the novel species. Based on polyphasic and genomic approaches, a novel lineage and species associated with healthy rice seeds for which the name Xanthomonas sontii sp. nov. is proposed. The type strain for the X. sontii sp. nov. is PPL1T (JCM 33631T = CFBP 8688T = ICMP 23426T = MTCC 12491T) and PPL2 (JCM 33632 = CFBP 8689 = ICMP 23427 = MTCC 12492) and PPL3 (JCM 33633 = CFBP 8690 = ICMP 23428 = MTCC 12493) as other strains of the species.

RevDate: 2021-09-20

Zhang M, Zhang Y, Han X, et al (2021)

Whole genome sequencing of Enterobacter mori, an emerging pathogen of kiwifruit and the potential genetic adaptation to pathogenic lifestyle.

AMB Express, 11(1):129.

Members of the Enterobacter genus are gram-negative bacteria, which are used as plant growth-promoting bacteria, and increasingly recovered from economic plants as emerging pathogens. A new Enterobacter mori strain, designated CX01, was isolated as an emerging bacterial pathogen of a recent outbreak of kiwifruit canker-like disease in China. The main symptoms associated with this syndrome are bleeding cankers on the trunk and branch, and brown leaf spots. The genome sequence of E. mori CX01 was determined as a single chromosome of 4,966,908 bp with 4640 predicted open reading frames (ORFs). To better understand the features of the genus and its potential pathogenic mechanisms, five available Enterobacter genomes were compared and a pan-genome of 4870 COGs with 3158 core COGs were revealed. An important feature of the E. mori CX01 genome is that it lacks a type III secretion system often found in pathogenic bacteria, instead it is equipped with type I, II, and VI secretory systems. Besides, the genes encoding putative virulence effectors, two-component systems, nutrient acquisition systems, proteins involved in phytohormone synthesis, which may contribute to the virulence and adaption to the host plant niches are included. The genome sequence of E. mori CX01 has high similarity with that of E. mori LMG 25,706, though the rearrangements occur throughout two genomes. Further pathogenicity assay showed that both strains can either invade kiwifruit or mulberry, indicating they may have similar host range. Comparison with a closely related isolate enabled us to understand its pathogenesis and ecology.

RevDate: 2021-09-17

Yocca AE, PP Edger (2021)

Machine learning approaches to identify core and dispensable genes in pangenomes.

The plant genome [Epub ahead of print].

A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable). Previous pangenomic studies have identified certain functional differences between core and dispensable genes. However, identifying if a gene belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals. Here we aim to leverage the previously characterized core and dispensable gene content for two grass species [Brachypodium distachyon (L.) P. Beauv. and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome. Such a model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.

RevDate: 2021-09-16

Olanrewaju OS, Ayilara MS, Ayangbenro AS, et al (2021)

Genome Mining of Three Plant Growth-Promoting Bacillus Species from Maize Rhizosphere.

Applied biochemistry and biotechnology [Epub ahead of print].

Bacillus species genomes are rich in plant growth-promoting genetic elements. Bacillus subtilis and Bacillus velezensis are important plant growth promoters; hence, to further improve their abilities, the genetic elements responsible for these traits were characterized and reported. Genetic elements reported include those of auxin, nitrogen fixation, siderophore production, iron acquisition, volatile organic compounds, and antibiotics. Furthermore, the presence of phages and antibiotic-resistant genes in the genomes are reported. Pan-genome analysis was conducted using ten Bacillus species. From the analysis, pan-genome of Bacillus subtilis and Bacillus velezensis are still open. Ultimately, this study brings an insight into the genetic components of the plant growth-promoting abilities of these strains and shows their potential biotechnological applications in agriculture and other relevant sectors.

RevDate: 2021-09-18

Colquhoun RM, Hall MB, Lima L, et al (2021)

Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs.

Genome biology, 22(1):267.

We present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.

RevDate: 2021-09-14

Da Silva K, Pons N, Berland M, et al (2021)

StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs.

PeerJ, 9:e11884.

Current studies are shifting from the use of single linear references to representation of multiple genomes organised in pangenome graphs or variation graphs. Meanwhile, in metagenomic samples, resolving strain-level abundances is a major step in microbiome studies, as associations between strain variants and phenotype are of great interest for diagnostic and therapeutic purposes. We developed StrainFLAIR with the aim of showing the feasibility of using variation graphs for indexing highly similar genomic sequences up to the strain level, and for characterizing a set of unknown sequenced genomes by querying this graph. On simulated data composed of mixtures of strains from the same bacterial species Escherichia coli, results show that StrainFLAIR was able to distinguish and estimate the abundances of close strains, as well as to highlight the presence of a new strain close to a referenced one and to estimate its abundance. On a real dataset composed of a mix of several bacterial species and several strains for the same species, results show that in a more complex configuration StrainFLAIR correctly estimates the abundance of each strain. Hence, results demonstrated how graph representation of multiple close genomes can be used as a reference to characterize a sample at the strain level.

RevDate: 2021-09-10

Hall RJ, Whelan FJ, Cummins EA, et al (2021)

Gene-gene relationships in an Escherichia coli accessory genome are linked to function and mobility.

Microbial genomics, 7(9):.

The pangenome contains all genes encoded by a species, with the core genome present in all strains and the accessory genome in only a subset. Coincident gene relationships are expected within the accessory genome, where the presence or absence of one gene is influenced by the presence or absence of another. Here, we analysed the accessory genome of an Escherichia coli pangenome consisting of 400 genomes from 20 sequence types to identify genes that display significant co-occurrence or avoidance patterns with one another. We present a complex network of genes that are either found together or that avoid one another more often than would be expected by chance, and show that these relationships vary by lineage. We demonstrate that genes co-occur by function, and that several highly connected gene relationships are linked to mobile genetic elements. We find that genes are more likely to co-occur with, rather than avoid, another gene in the accessory genome. This work furthers our understanding of the dynamic nature of prokaryote pangenomes and implicates both function and mobility as drivers of gene relationships.

RevDate: 2021-09-20

Mazzuoli MV, Daunesse M, Varet H, et al (2021)

The CovR regulatory network drives the evolution of Group B Streptococcus virulence.

PLoS genetics, 17(9):e1009761.

Virulence of the neonatal pathogen Group B Streptococcus is under the control of the master regulator CovR. Inactivation of CovR is associated with large-scale transcriptome remodeling and impairs almost every step of the interaction between the pathogen and the host. However, transcriptome analyses suggested a plasticity of the CovR signaling pathway in clinical isolates leading to phenotypic heterogeneity in the bacterial population. In this study, we characterized the CovR regulatory network in a strain representative of the CC-17 hypervirulent lineage responsible of the majority of neonatal meningitis. Transcriptome and genome-wide binding analysis reveal the architecture of the CovR network characterized by the direct repression of a large array of virulence-associated genes and the extent of co-regulation at specific loci. Comparative functional analysis of the signaling network links strain-specificities to the regulation of the pan-genome, including the two specific hypervirulent adhesins and horizontally acquired genes, to mutations in CovR-regulated promoters, and to variability in CovR activation by phosphorylation. This regulatory adaptation occurs at the level of genes, promoters, and of CovR itself, and allows to globally reshape the expression of virulence genes. Overall, our results reveal the direct, coordinated, and strain-specific regulation of virulence genes by the master regulator CovR and suggest that the intra-species evolution of the signaling network is as important as the expression of specific virulence factors in the emergence of clone associated with specific diseases.

RevDate: 2021-09-08

Li G, Jiang T, Li J, et al (2021)

PanSVR: Pan-Genome Augmented Short Read Realignment for Sensitive Detection of Structural Variations.

Frontiers in genetics, 12:731515.

The comprehensive discovery of structure variations (SVs) is fundamental to many genomics studies and high-throughput sequencing has become a common approach to this task. However, due the limited length, it is still non-trivial to state-of-the-art tools to accurately align short reads and produce high-quality SV callsets. Pan-genome provides a novel and promising framework to short read-based SV calling since it enables to comprehensively integrate known variants to reduce the incompleteness and bias of single reference to breakthrough the bottlenecks of short read alignments and provide new evidences to the detection of SVs. However, it is still an open problem to develop effective computational approaches to fully take the advantage of pan-genomes. Herein, we propose Pan-genome augmented Structure Variation calling tool with read Re-alignment (PanSVR), a novel pan-genome-based SV calling approach. PanSVR uses several tailored methods to implement precise re-alignment for SV-spanning reads against well-organized pan-genome reference with plenty of known SVs. PanSVR enables to greatly improve the quality of short read alignments and produce clear and homogenous SV signatures which facilitate SV calling. Benchmark results on real sequencing data suggest that PanSVR is able to largely improve the sensitivity of SV calling than that of state-of-the-art SV callers, especially for the SVs from repeat-rich regions and/or novel insertions which are difficult to existing tools.

RevDate: 2021-09-08

Karthik K, Anbazhagan S, Thomas P, et al (2021)

Genome Sequencing and Comparative Genomics of Indian Isolates of Brucella melitensis.

Frontiers in microbiology, 12:698069.

Brucella melitensis causes small ruminant brucellosis and a zoonotic pathogen prevalent worldwide. Whole genome phylogeny of all available B. melitensis genomes (n = 355) revealed that all Indian isolates (n = 16) clustered in the East Mediterranean lineage except the ADMAS-GI strain. Pangenome analysis indicated the presence of limited accessory genomes with few clades showing specific gene presence/absence pattern. A total of 43 virulence genes were predicted in all the Indian strains of B. melitensis except 2007BM-1 (ricA and wbkA are absent). Multilocus sequence typing (MLST) analysis indicated all except one Indian strain (ADMAS-GI) falling into sequence type (ST 8). In comparison with MLST, core genome phylogeny indicated two major clusters (>70% bootstrap support values) among Indian strains. Clusters with <70% bootstrap support values represent strains with diverse evolutionary origins present among animal and human hosts. Genetic relatedness among animal (sheep and goats) and human strains with 100% bootstrap values shows its zoonotic transfer potentiality. SNP-based analysis indicated similar clustering to that of core genome phylogeny. Among the Indian strains, the highest number of unique SNPs (112 SNPs) were shared by a node that involved three strains from Tamil Nadu. The node SNPs involved several peptidase genes like U32, M16 inactive domain protein, clp protease family protein, and M23 family protein and mostly represented non-synonymous (NS) substitutions. Vaccination has been followed in several parts of the world to prevent small ruminant brucellosis but not in India. Comparison of Indian strains with vaccine strains showed that M5 is genetically closer to most of the Indian strains than Rev.1 strain. The presence of most of the virulence genes among all Indian strains and conserved core genome compositions suggest the use of any circulating strain/genotypes for the development of a vaccine candidate for small ruminant brucellosis in India.

RevDate: 2021-09-08

Agarwal G, Choudhary D, Stice SP, et al (2021)

Pan-Genome-Wide Analysis of Pantoea ananatis Identified Genes Linked to Pathogenicity in Onion.

Frontiers in microbiology, 12:684756.

Pantoea ananatis, a gram negative and facultative anaerobic bacterium is a member of a Pantoea spp. complex that causes center rot of onion, which significantly affects onion yield and quality. This pathogen does not have typical virulence factors like type II or type III secretion systems but appears to require a biosynthetic gene-cluster, HiVir/PASVIL (located chromosomally comprised of 14 genes), for a phosphonate secondary metabolite, and the 'alt' gene cluster (located in plasmid and comprised of 11 genes) that aids in bacterial colonization in onion bulbs by imparting tolerance to thiosulfinates. We conducted a deep pan-genome-wide association study (pan-GWAS) to predict additional genes associated with pathogenicity in P. ananatis using a panel of diverse strains (n = 81). We utilized a red-onion scale necrosis assay as an indicator of pathogenicity. Based on this assay, we differentiated pathogenic (n = 51)- vs. non-pathogenic (n = 30)-strains phenotypically. Pan-genome analysis revealed a large core genome of 3,153 genes and a flexible accessory genome. Pan-GWAS using the presence and absence variants (PAVs) predicted 42 genes, including 14 from the previously identified HiVir/PASVIL cluster associated with pathogenicity, and 28 novel genes that were not previously associated with pathogenicity in onion. Of the 28 novel genes identified, eight have annotated functions of site-specific tyrosine kinase, N-acetylmuramoyl-L-alanine amidase, conjugal transfer, and HTH-type transcriptional regulator. The remaining 20 genes are currently hypothetical. Further, a core-genome SNPs-based phylogeny and horizontal gene transfer (HGT) studies were also conducted to assess the extent of lateral gene transfer among diverse P. ananatis strains. Phylogenetic analysis based on PAVs and whole genome multi locus sequence typing (wgMLST) rather than core-genome SNPs distinguished red-scale necrosis inducing (pathogenic) strains from non-scale necrosis inducing (non-pathogenic) strains of P. ananatis. A total of 1182 HGT events including the HiVir/PASVIL and alt cluster genes were identified. These events could be regarded as a major contributing factor to the diversification, niche-adaptation and potential acquisition of pathogenicity/virulence genes in P. ananatis.

RevDate: 2021-09-14

Letcher B, Hunt M, Z Iqbal (2021)

Gramtools enables multiscale variation analysis with genome graphs.

Genome biology, 22(1):259.

Genome graphs allow very general representations of genetic variation; depending on the model and implementation, variation at different length-scales (single nucleotide polymorphisms (SNPs), structural variants) and on different sequence backgrounds can be incorporated with different levels of transparency. We implement a model which handles this multiscale variation and develop a JSON extension of VCF (jVCF) allowing for variant calls on multiple references, both implemented in our software gramtools. We find gramtools outperforms existing methods for genotyping SNPs overlapping large deletions in M. tuberculosis and is able to genotype on multiple alternate backgrounds in P. falciparum, revealing previously hidden recombination.

RevDate: 2021-09-06

Gupta PK (2021)

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k-mers.

BioEssays : news and reviews in molecular, cellular and developmental biology [Epub ahead of print].

The development of improved methods for genome-wide association studies (GWAS) for genetics of quantitative traits has been an active area of research during the last 25 years. This activity initially started with the use of mixed linear model (MLM), which was variously modified. During the last decade, however, with the availability of high throughput next generation sequencing (NGS) technology, development and use of pangenomes and novel markers including structural variations (SVs) and k-mers for GWAS has taken over as a new thrust area of research. Pangenomes and SVs are now available in humans, livestock, and a number of plant species, so that these resources along with k-mers are being used in GWAS for exploring additional genetic variation that was hitherto not available for analysis. These developments have resulted in significant improvement in GWAS methodology for detection of marker-trait associations (MTAs) that are relevant to human healthcare and crop improvement.

RevDate: 2021-09-07

Mann A, Malik S, Rana JS, et al (2021)

Whole genome sequencing data of Klebsiella aerogenes isolated from agricultural soil of Haryana, India.

Data in brief, 38:107311.

Klebsiella aerogenes, is a Gram-negative bacterium, which was previously known as Enterobacter aerogenes. It is present in all environments such as water, soil, air and hospitals; and is an opportunistic pathogen that causes several types of infections. As compared to other clinically important pathogens included in the ESKAPE category (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), the pangenome and population structure of Klebsiella aerogenes is still poorly understood. For the present study, the bacterial sample was isolated from agricultural soils of Haryana, India. With an aim to identify the occurrence of multi-drug resistance genes in the agricultural field soil bacterial isolate, whole genome sequencing (WGS) of the bacteria was performed; and the antibiotic resistance causing genes, along with the genes responsible for other major functions of the cell; and the different Single Nuceotide Polymorphisms (SNPs) and Insertions and deletions (InDels) were identified. The data presented in this manuscript can be reused by researchers as a reference for determining the antibiotic resistance genes that could be present in different bacterial isolates, and it would also help in determination of functions of various other genes present in other genomes of Klebsiella species.

RevDate: 2021-09-07

Rai A, Jagadeeshwari U, Deepshikha G, et al (2021)

Phylotaxogenomics for the Reappraisal of the Genus Roseomonas With the Creation of Six New Genera.

Frontiers in microbiology, 12:677842.

The genus Roseomonas is a significant group of bacteria which is invariably of great clinical and ecological importance. Previous studies have shown that the genus Roseomonas is polyphyletic in nature. Our present study focused on generating a lucid understanding of the phylogenetic framework for the re-evaluation and reclassification of the genus Roseomonas. Phylogenetic studies based on the 16S rRNA gene and 92 concatenated genes suggested that the genus is heterogeneous, forming seven major groups. Existing Roseomonas species were subjected to an array of genomic, phenotypic, and chemotaxonomic analyses in order to resolve the heterogeneity. Genomic similarity indices (dDDH and ANI) indicated that the members were well-defined at the species level. The Percentage of Conserved Proteins (POCP) and the average Amino Acid Identity (AAI) values between the groups of the genus Roseomonas and other interspersing members of the family Acetobacteraceae were below 65 and 70%, respectively. The pan-genome evaluation depicted that the pan-genome was an open type and the members shared 958 core genes. This claim of reclassification was equally supported by the phenotypic and chemotaxonomic differences between the groups. Thus, in this study, we propose to re-evaluate and reclassify the genus Roseomonas and propose six novel genera as Pararoseomonas gen. nov., Falsiroseomonas gen. nov., Paeniroseomonas gen. nov., Plastoroseomonas gen. nov., Neoroseomonas gen. nov., and Pseudoroseomonas gen. nov.

RevDate: 2021-09-04

Vandamme P, Peeters C, Seth-Smith HMB, et al (2021)

Gulosibacter hominis sp. nov.: a novel human microbiome bacterium that may cause opportunistic infections.

Antonie van Leeuwenhoek [Epub ahead of print].

We present genomic, phylogenomic, and phenotypic taxonomic data to demonstrate that three human ear isolates represent a novel species within the genus Gulosibacter. These isolates could not be identified reliably using MALDI-TOF mass spectrometry during routine diagnostic work, but partial 16S rRNA gene sequence analysis revealed that they belonged to the genus Gulosibacter. Overall genomic relatedness indices between the draft genome sequences of the three isolates and of the type strains of established Gulosibacter species confirmed that the three isolates represented a single novel Gulosibacter species. A biochemical characterisation yielded differential tests between the novel and established Gulosibacter species, which could also be differentiated using MALDI-TOF mass spectrometry. We propose to formally classify these three isolates into Gulosibacter hominis sp. nov., with 401352-2018 T (= LMG 31778 T, CCUG 74795 T) as the type strain. The whole-genome sequence of strain 401352-2018 T has a size of 2,340,181 bp and a G+C content of 62.04 mol%. A Gulosibacter pangenome analysis revealed 467 gene clusters that were exclusively present in G. hominis genomes. While these G. hominis specific gene clusters were enriched in several COG functional categories, this analysis did not reveal functions that suggested a role in the human microbiome, nor did it explain the occurrence of G. hominis in ear infections. The absence of acquired antimicrobial resistance determinants and virulence factors in the G. hominis genomes, and an analysis of publicly available 16S rRNA gene sequences and 16S rRNA amplicon sequencing data sets suggested that G. hominis is a member of the human skin microbiota that may occasionally be involved in opportunistic infections.

RevDate: 2021-09-23

Li Q, Tian S, Yan B, et al (2021)

Building a Chinese pan-genome of 486 individuals.

Communications biology, 4(1):1016.

Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that the common sequence size is uncapped with a growing population. The 46.646 Mbp common sequences obtained from the 486 individuals improved the accuracy of variant calling and mapping rate when added to the reference genome. We also analyzed the genomic positions of these common sequences and found that they came from genomic regions characterized by high mutation rate and low pathogenicity. Our study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference.

RevDate: 2021-09-24

Peters S, Pascoe B, Wu Z, et al (2021)

Campylobacter jejuni genotypes are associated with post-infection irritable bowel syndrome in humans.

Communications biology, 4(1):1015.

Campylobacter enterocolitis may lead to post-infection irritable bowel syndrome (PI-IBS) and while some C. jejuni strains are more likely than others to cause human disease, genomic and virulence characteristics promoting PI-IBS development remain uncharacterized. We combined pangenome-wide association studies and phenotypic assays to compare C. jejuni isolates from patients who developed PI-IBS with those who did not. We show that variation in bacterial stress response (Cj0145_phoX), adhesion protein (Cj0628_CapA), and core biosynthetic pathway genes (biotin: Cj0308_bioD; purine: Cj0514_purQ; isoprenoid: Cj0894c_ispH) were associated with PI-IBS development. In vitro assays demonstrated greater adhesion, invasion, IL-8 and TNFα secretion on colonocytes with PI-IBS compared to PI-no-IBS strains. A risk-score for PI-IBS development was generated using 22 genomic markers, four of which were from Cj1631c, a putative heme oxidase gene linked to virulence. Our finding that specific Campylobacter genotypes confer greater in vitro virulence and increased risk of PI-IBS has potential to improve understanding of the complex host-pathogen interactions underlying this condition.

RevDate: 2021-09-03

Porcellato D, Smistad M, Skeie SB, et al (2021)

Whole genome sequencing reveals possible host species adaptation of Streptococcus dysgalactiae.

Scientific reports, 11(1):17350.

Streptococcus dysgalactiae (SD) is an emerging pathogen in human and veterinary medicine, and is associated with several host species, disease phenotypes and virulence mechanisms. SD has traditionally been divided into the subspecies dysgalactiae (SDSD) and subsp. equisimilis (SDSE), but recent molecular studies have indicated that the phylogenetic relationships are more complex. Moreover, the genetic basis for the niche versatility of SD has not been extensively investigated. To expand the knowledge about virulence factors, phylogenetic relationships and host-adaptation strategies of SD, we analyzed 78 SDSD genomes from cows and sheep, and 78 SDSE genomes from other host species. Sixty SDSD and 40 SDSE genomes were newly sequenced in this study. Phylogenetic analysis supported SDSD as a distinct taxonomic entity, presenting a mean value of the average nucleotide identity of 99%. Bovine and ovine associated SDSD isolates clustered separately on pangenome analysis, but no single gene or genetic region was uniquely associated with host species. In contrast, SDSE isolates were more heterogenous and could be delineated in accordance with host. Although phylogenetic clustering suggestive of cross species transmission was observed, we predominantly detected a host restricted distribution of the SD-lineages. Furthermore, lineage specific virulence factors were detected, several of them located in proximity to hotspots for integration of mobile genetic elements. Our study indicates that SD has evolved to adapt to several different host species and infers a potential role of horizontal genetic transfer in niche specialization.

RevDate: 2021-08-31

Bachert BA, Richardson JB, Mlynek KD, et al (2021)

Development, Phenotypic Characterization and Genomic Analysis of a Francisella tularensis Panel for Tularemia Vaccine Testing.

Frontiers in microbiology, 12:725776.

Francisella tularensis is one of several biothreat agents for which a licensed vaccine is needed to protect against this pathogen. To aid in the development of a vaccine protective against pneumonic tularemia, we generated and characterized a panel of F. tularensis isolates that can be used as challenge strains to assess vaccine efficacy. Our panel consists of both historical and contemporary isolates derived from clinical and environmental sources, including human, tick, and rabbit isolates. Whole genome sequencing was performed to assess the genetic diversity in comparison to the reference genome F. tularensis Schu S4. Average nucleotide identity analysis showed >99% genomic similarity across the strains in our panel, and pan-genome analysis revealed a core genome of 1,707 genes, and an accessory genome of 233 genes. Three of the strains in our panel, FRAN254 (tick-derived), FRAN255 (a type B strain), and FRAN256 (a human isolate) exhibited variation from the other strains. Moreover, we identified several unique mutations within the Francisella Pathogenicity Island across multiple strains in our panel, revealing unexpected diversity in this region. Notably, FRAN031 (Scherm) completely lacked the second pathogenicity island but retained virulence in mice. In contrast, FRAN037 (Coll) was attenuated in a murine pneumonic tularemia model and had mutations in pdpB and iglA which likely led to attenuation. All of the strains, except FRAN037, retained full virulence, indicating their effectiveness as challenge strains for future vaccine testing. Overall, we provide a well-characterized panel of virulent F. tularensis strains that can be utilized in ongoing efforts to develop an effective vaccine against pneumonic tularemia to ensure protection is achieved across a range F. tularensis strains.

RevDate: 2021-08-31

Outten J, A Warren (2021)

Methods and Developments in Graphical Pangenomics.

Journal of the Indian Institute of Science [Epub ahead of print].

Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.

RevDate: 2021-08-31

Mashima I, Liao YC, Lin CH, et al (2021)

Comparative Pan-Genome Analysis of Oral Veillonella Species.

Microorganisms, 9(8):.

The genus Veillonella is a common and abundant member of the oral microbiome. It includes eight species, V. atypica, V. denticariosi, V. dispar, V. infantium, V. nakazawae, V. parvula, V. rogosae and V. tobetusensis. They possess important metabolic pathways that utilize lactate as an energy source. However, the overall metabolome of these species has not been studied. To further understand the metabolic framework of Veillonella in the human oral microbiome, we conducted a comparative pan-genome analysis of the eight species of oral Veillonella. Analysis of the oral Veillonella pan-genome revealed features based on KEGG pathway information to adapt to the oral environment. We found that the fructose metabolic pathway was conserved in all oral Veillonella species, and oral Veillonella have conserved pathways that utilize carbohydrates other than lactate as an energy source. This discovery may help to better understand the metabolic network among oral microbiomes and will provide guidance for the design of future in silico and in vitro studies.

RevDate: 2021-08-31

Agarwal G, Gitaitis RD, B Dutta (2021)

Pan-Genome of Novel Pantoea stewartii subsp. indologenes Reveals Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer.

Microorganisms, 9(8):.

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot on foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onions. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onions and millets or on millets only, respectively. In the current study, we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n = 4) and pv. setariae (n = 13)]. The full spectrum of the pan-genome contained 7030 genes. Among these, 3546 (present in genomes of all 17 strains) were the core genes that were a subset of 3682 soft-core genes (present in ≥16 strains). The accessory genome included 1308 shell genes and 2040 cloud genes (present in ≤2 strains). The pan-genome showed a clear linear progression with >6000 genes, suggesting that the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison with core genome SNPs-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study using Psi strains from both pathovars along with strains from other Pantoea species, namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfer events occurring between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes, including seven gene-clusters, which were associated with the pathogenicity phenotype (necrosis on seedling) on onions. One of the gene-clusters contained 11 genes with known functions and was found to be chromosomally located.

RevDate: 2021-08-31

Lee JY, Lee DH, DH Kim (2021)

Characterization of Martelella soudanensis sp. nov., Isolated from a Mine Sediment.

Microorganisms, 9(8):.

Gram-stain-negative, strictly aerobic, non-spore-forming, non-motile, and rod-shaped bacterial strains, designated NC18T and NC20, were isolated from the sediment near-vertical borehole effluent originating 714 m below the subsurface located in the Soudan Iron Mine in Minnesota, USA. The 16S rRNA gene sequence showed that strains NC18T and NC20 grouped with members of the genus Martelella, including M. mediterranea DSM 17316T and M. limonii YC7034T. The genome sizes and G + C content of both NC18T and NC20 were 6.1 Mb and 61.8 mol%, respectively. Average nucleotide identity (ANI), the average amino acid identity (AAI), and digital DNA-DNA hybridization (dDDH) values were below the species delineation threshold. Pan-genomic analysis showed that NC18T, NC20, M. mediterranea DSM 17316T, M. endophytica YC6887T, and M. lutilitoris GH2-6T had 8470 pan-genome orthologous groups (POGs) in total. Five Martelella strains shared 2258 POG core, which were mainly associated with amino acid transport and metabolism, general function prediction only, carbohydrate transport and metabolism, translation, ribosomal structure and biogenesis, and transcription. The two novel strains had major fatty acids (>5%) including summed feature 8 (C18:1 ω7c and/or C18:1 ω6c), C19:0 cyclo ω8c, C16:0, C18:1 ω7c 11-methyl, C18:0, and summed feature 2 (C12:0 aldehyde and/or iso-C16:1 I and/or C14:0 3-OH). The sole respiratory quinone was uniquinone-10 (Q-10). On the basis of polyphasic taxonomic analyses, strains NC18T and NC20 represent novel species of the genus Martelella, for which the name Martelella soudanensis sp. nov. is proposed. The type strain is NC18T (=KTCT 82174T = NBRC 114661T).

RevDate: 2021-08-31

Castillo D, Donati VL, Jørgensen J, et al (2021)

Comparative Genomic Analyses of Flavobacterium psychrophilum Isolates Reveals New Putative Genetic Determinants of Virulence Traits.

Microorganisms, 9(8):.

The fish pathogen Flavobacterium psychrophilum is currently one of the main pathogenic bacteria hampering the productivity of salmonid farming worldwide. Although putative virulence determinants have been identified, the genetic basis for variation in virulence of F. psychrophilum is not fully understood. In this study, we analyzed whole-genome sequences of a collection of 25 F. psychrophilum isolates from Baltic Sea countries and compared genomic information with a previous determination of their virulence in juvenile rainbow trout. The results revealed a conserved population of F. psychrophilum that were consistently present across the Baltic Sea countries, with no clear association between genomic repertoire, phylogenomic, or gene distribution and virulence traits. However, analysis of the entire genome of four F. psychrophilum isolates by hybrid assembly provided an unprecedented resolution for discriminating even highly related isolates. The results showed that isolates with different virulence phenotypes harbored genetic variances on a number of consecutive leucine-rich repeat (LRR) proteins, repetitive motifs in gliding motility-associated protein, and the insertion of transposable elements into intergenic and genic regions. Thus, these findings provide novel insights into the genetic variation of these elements and their putative role in the modulation of F. psychrophilum virulence.

RevDate: 2021-08-30

Lin N, Tao Y, Gao P, et al (2021)

Comparative Genomics Revealing Insights into Niche Separation of the Genus Methylophilus.

Microorganisms, 9(8):.

The genus Methylophilus uses methanol as a carbon and energy source, which is widely distributed in terrestrial, freshwater and marine ecosystems. Here, three strains (13, 14 and QUAN) related to the genus Methylophilus, were newly isolated from Lake Fuxian sediments. The draft genomes of strains 13, 14 and QUAN were 3.11 Mb, 3.02 Mb, 3.15 Mb with a G+C content of 51.13, 50.48 and 50.33%, respectively. ANI values between strains 13 and 14, 13 and QUAN, and 14 and QUAN were 81.09, 81.06 and 91.46%, respectively. Pan-genome and core-genome included 3994 and 1559 genes across 18 Methylophilus genomes, respectively. Phylogenetic analysis based on 1035 single-copy genes and 16S rRNA genes revealed two clades, one containing strains isolated from aquatic and the other from the leaf surface. Twenty-three aquatic-specific genes, such as 2OG/Fe(II) oxygenase and diguanylate cyclase, reflected the strategy to survive in oxygen-limited water and sediment. Accordingly, 159 genes were identified specific to leaf association. Besides niche separation, Methylophilus could utilize the combination of ANRA and DNRA to convert nitrate to ammonia and reduce sulfate to sulfur according to the complete sulfur metabolic pathway. Genes encoding the cytochrome c protein and riboflavin were detected in Methylophilus genomes, which directly or indirectly participate in electron transfer.

RevDate: 2021-09-15

Xu S, Li Z, Huang Y, et al (2021)

Whole genome sequencing reveals the genomic diversity, taxonomic classification, and evolutionary relationships of the genus Nocardia.

PLoS neglected tropical diseases, 15(8):e0009665.

Nocardia is a complex and diverse genus of aerobic actinomycetes that cause complex clinical presentations, which are difficult to diagnose due to being misunderstood. To date, the genetic diversity, evolution, and taxonomic structure of the genus Nocardia are still unclear. In this study, we investigated the pan-genome of 86 Nocardia type strains to clarify their genetic diversity. Our study revealed an open pan-genome for Nocardia containing 265,836 gene families, with about 99.7% of the pan-genome being variable. Horizontal gene transfer appears to have been an important evolutionary driver of genetic diversity shaping the Nocardia genome and may have caused historical taxonomic confusion from other taxa (primarily Rhodococcus, Skermania, Aldersonia, and Mycobacterium). Based on single-copy gene families, we established a high-accuracy phylogenomic approach for Nocardia using 229 genome sequences. Furthermore, we found 28 potentially new species and reclassified 16 strains. Finally, by comparing the topology between a phylogenomic tree and 384 phylogenetic trees (from 384 single-copy genes from the core genome), we identified a novel locus for inferring the phylogeny of this genus. The dapb1 gene, which encodes dipeptidyl aminopeptidase BI, was far superior to commonly used markers for Nocardia and yielded a topology almost identical to that of genome-based phylogeny. In conclusion, the present study provides insights into the genetic diversity, contributes a robust framework for the taxonomic classification, and elucidates the evolutionary relationships of Nocardia. This framework should facilitate the development of rapid tests for the species identification of highly variable species and has given new insight into the behavior of this genus.

RevDate: 2021-08-27

Shapiro JW, C Putonti (2021)

Rephine.r: a pipeline for correcting gene calls and clusters to improve phage pangenomes and phylogenies.

PeerJ, 9:e11950.

Background: A pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools.

Methods: We developed an R pipeline called Rephine.r that takes as input the gene clusters produced by an initial pangenomics workflow. Rephine.r then proceeds in two primary steps. First, it identifies three common causes of fragmented gene calls: (1) indels creating early stop codons and new start codons; (2) interruption by a selfish genetic element; and (3) splitting at the ends of the reported genome. Fragmented genes are then fused to create new sequence alignments. In tandem, Rephine.r searches for distant homologs separated into different gene families using Hidden Markov Models. Significant hits are used to merge families into larger clusters. A final round of fragment identification is then run, and results may be used to infer single-copy core genomes and phylogenetic trees.

Results: We applied Rephine.r to three well-studied phage groups: the Tevenvirinae (e.g., T4), the Studiervirinae (e.g., T7), and the Pbunaviruses (e.g., PB1). In each case, Rephine.r recovered additional members of the single-copy core genome and increased the overall bootstrap support of the phylogeny. The Rephine.r pipeline is provided through GitHub (https://www.github.com/coevoeco/Rephine.r) as a single script for automated analysis and with utility functions to assist in building single-copy core genomes and predicting the sources of fragmented genes.

RevDate: 2021-09-10

Clermont O, Condamine B, Dion S, et al (2021)

The E phylogroup of Escherichia coli is highly diverse and mimics the whole E. coli species population structure.

Environmental microbiology [Epub ahead of print].

To get a global picture of the population structure of the Escherichia coli phylogroup E, encompassing the O157:H7 EHEC lineage, we analysed the whole genome of 144 strains isolated from various continents, hosts and lifestyles and representative of the phylogroup diversity. The strains possess 4331 to 5440 genes with a core genome of 2771 genes and a pangenome of 33 722 genes. The distribution of these genes among the strains shows an asymmetric U-shaped distribution. E phylogenetic strains have the largest genomes of the species, partly explained by the presence of mobile genetic elements. Sixty-eight lineages were delineated, some of them exhibiting extra-intestinal virulence genes and being virulent in the mouse sepsis model. Except for the EHEC lineages and the reference EPEC, EIEC and ETEC strains, very few strains possess intestinal virulence genes. Most of the strains were devoid of acquired resistance genes, but eight strains possessed extended-spectrum beta-lactamase genes. Human strains belong to specific lineages, some of them being virulent and antibiotic-resistant [sequence type complexes (STcs) 350 and 2064]. The E phylogroup mimics all the features of the species as a whole, a phenomenon already observed at the STc level, arguing for a fractal population structure of E. coli.

RevDate: 2021-08-24

Lee AHY, Porto WF, de Faria C, et al (2021)

Genomic insights into the diversity, virulence and resistance of Klebsiella pneumoniae extensively drug resistant clinical isolates.

Microbial genomics, 7(8):.

Klebsiella pneumoniae has been implicated in wide-ranging nosocomial outbreaks, causing severe infections without effective treatments due to antibiotic resistance. Here, we performed genome sequencing of 70 extensively drug resistant clinical isolates, collected from Brasília's hospitals (Brazil) between 2010 and 2014. The majority of strains (60 out of 70) belonged to a single clonal complex (CC), CC258, which has become distributed worldwide in the last two decades. Of these CC258 strains, 44 strains were classified as sequence type 11 (ST11) and fell into two distinct clades, but no ST258 strains were found. These 70 strains had a pan-genome size of 10 366 genes, with a core-genome size of ~4476 genes found in 95 % of isolates. Analysis of sequences revealed diverse mechanisms of resistance, including production of multidrug efflux pumps, enzymes with the same target function but with reduced or no affinity to the drug, and proteins that protected the drug target or inactivated the drug. β-Lactamase production provided the most notable mechanism associated with K. pneumoniae. Each strain presented two or three different β-lactamase enzymes, including class A (SHV, CTX-M and KPC), class B and class C AmpC enzymes, although no class D β-lactamase was identified. Strains carrying the NDM enzyme involved three different ST types, suggesting that there was no common genetic origin.

RevDate: 2021-09-01
CmpDate: 2021-09-01

Woodhouse MR, Cannon EK, Portwood JL, et al (2021)

A pan-genomic approach to genome databases using maize as a model system.

BMC plant biology, 21(1):385.

Research in the past decade has demonstrated that a single reference genome is not representative of a species' diversity. MaizeGDB introduces a pan-genomic approach to hosting genomic data, leveraging the large number of diverse maize genomes and their associated datasets to quickly and efficiently connect genomes, gene models, expression, epigenome, sequence variation, structural variation, transposable elements, and diversity data across genomes so that researchers can easily track the structural and functional differences of a locus and its orthologs across maize. We believe our framework is unique and provides a template for any genomic database poised to host large-scale pan-genomic data.

RevDate: 2021-08-21

Hudec C, Biessy A, Novinscak A, et al (2021)

Comparative Genomics of Potato Common Scab-Causing Streptomyces spp. Displaying Varying Virulence.

Frontiers in microbiology, 12:716522.

Common scab of potato causes important economic losses worldwide following the development of necrotic lesions on tubers. In this study, the genomes of 14 prevalent scab-causing Streptomyces spp. isolated from Prince Edward Island, one of the most important Canadian potato production areas, were sequenced and annotated. Their phylogenomic affiliation was determined, their pan-genome was characterized, and pathogenic determinants involved in their virulence, ranging from weak to aggressive, were compared. 13 out of 14 strains clustered with Streptomyces scabiei, while the last strain clustered with Streptomyces acidiscabies. The toxicogenic and colonization genomic regions were compared, and while some atypical gene organizations were observed, no clear correlation with virulence was observed. The production of the phytotoxin thaxtomin A was also quantified and again, contrary to previous reports in the literature, no clear correlation was found between the amount of thaxtomin A secreted, and the virulence observed. Although no significant differences were observed when comparing the presence/absence of the main virulence factors among the strains of S. scabiei, a distinct profile was observed for S. acidiscabies. Several mutations predicted to affect the functionality of some virulence factors were identified, including one in the bldA gene that correlates with the absence of thaxtomin A production despite the presence of the corresponding biosynthetic gene cluster in S. scabiei LBUM 1485. These novel findings obtained using a large number of scab-causing Streptomyces strains are challenging some assumptions made so far on Streptomyces' virulence and suggest that other factors, yet to be characterized, are also key contributors.

RevDate: 2021-08-22

Vaid RK, Thakur Z, Anand T, et al (2021)

Comparative genome analysis of Salmonella enterica serovar Gallinarum biovars Pullorum and Gallinarum decodes strain specific genes.

PloS one, 16(8):e0255612.

Salmonella enterica serovar Gallinarum biovar Pullorum (bvP) and biovar Gallinarum (bvG) are the etiological agents of pullorum disease (PD) and fowl typhoid (FT) respectively, which cause huge economic losses to poultry industry especially in developing countries including India. Vaccination and biosecurity measures are currently being employed to control and reduce the S. Gallinarum infections. High endemicity, poor implementation of hygiene and lack of effective vaccines pose challenges in prevention and control of disease in intensively maintained poultry flocks. Comparative genome analysis unravels similarities and dissimilarities thus facilitating identification of genomic features that aids in pathogenesis, niche adaptation and in tracing of evolutionary history. The present investigation was carried out to assess the genotypic differences amongst S.enterica serovar Gallinarum strains including Indian strain S. Gallinarum Sal40 VTCCBAA614. The comparative genome analysis revealed an open pan-genome consisting of 5091 coding sequence (CDS) with 3270 CDS belonging to core-genome, 1254 CDS to dispensable genome and strain specific genes i.e. singletons ranging from 3 to 102 amongst the analyzed strains. Moreover, the investigated strains exhibited diversity in genomic features such as virulence factors, genomic islands, prophage regions, toxin-antitoxin cassettes, and acquired antimicrobial resistance genes. Core genome identified in the study can give important leads in the direction of design of rapid and reliable diagnostics, and vaccine design for effective infection control as well as eradication. Additionally, the identified genetic differences among the S. enterica serovar Gallinarum strains could be used for bacterial typing, structure based inhibitor development by future experimental investigations on the data generated.

RevDate: 2021-08-19

Simonsen AK (2021)

Environmental stress leads to genome streamlining in a widely distributed species of soil bacteria.

The ISME journal [Epub ahead of print].

Bacteria have highly flexible pangenomes, which are thought to facilitate evolutionary responses to environmental change, but the impacts of environmental stress on pangenome evolution remain unclear. Using a landscape pangenomics approach, I demonstrate that environmental stress leads to consistent, continuous reduction in genome content along four environmental stress gradients (acidity, aridity, heat, salinity) in naturally occurring populations of Bradyrhizobium diazoefficiens (widespread soil-dwelling plant mutualists). Using gene-level network and duplication functional traits to predict accessory gene distributions across environments, genes predicted to be superfluous are more likely lost in high stress, while genes with multi-functional roles are more likely retained. Genes with higher probabilities of being lost with stress contain significantly higher proportions of codons under strong purifying and positive selection. Gene loss is widespread across the entire genome, with high gene-retention hotspots in close spatial proximity to core genes, suggesting Bradyrhizobium has evolved to cluster essential-function genes (accessory genes with multifunctional roles and core genes) in discrete genomic regions, which may stabilise viability during genomic decay. In conclusion, pangenome evolution through genome streamlining are important evolutionary responses to environmental change. This raises questions about impacts of genome streamlining on the adaptive capacity of bacterial populations facing rapid environmental change.

RevDate: 2021-09-03
CmpDate: 2021-08-19

Belloso Daza MV, Cortimiglia C, Bassi D, et al (2021)

Genome-based studies indicate that the Enterococcus faecium Clade B strains belong to Enterococcus lactis species and lack of the hospital infection associated markers.

International journal of systematic and evolutionary microbiology, 71(8):.

Enterococcus lactis and the heterotypic synonym Enterococcus xinjiangensis from dairy origin have recently been identified as a novel species based on 16S rRNA gene sequence analysis. Enterococcus faecium type strain NCTC 7171T was used as the reference genome for determining E. lactis and E. faecium to be separate species. However, this taxonomic classification did not consider the diverse lineages of E. faecium, and the double nature of hospital-associated (clade A) and community-associated (clade B) isolates. Here, we investigated the taxonomic relationship among isolates of E. faecium of different origins and E. lactis, using a genome-based approach. Additional to 16S rRNA gene sequence analysis, we estimated the relatedness among strains and species using phylogenomics based on the core pangenome, multilocus sequence typing, the average nucleotide identity and digital DNA-DNA hybridization. Moreover, following the available safety assessment schemes, we evaluated the virulence profile and the ampicillin resistance of E. lactis and E. faecium clade B strains. Our results confirmed the genetic and evolutionary differences between clade A and the intertwined clade B and E. lactis group. We also confirmed the absence in these strains of virulence gene markers IS16, hylEfm and esp and the lack of the PBP5 allelic profile associated with ampicillin resistance. Taken together, our findings support the reassignment of the strains of E. faecium clade B as E. lactis.

RevDate: 2021-09-26

Matteoli FP, Pedrosa-Silva F, Dutra-Silva L, et al (2021)

The global population structure and beta-lactamase repertoire of the opportunistic pathogen Serratia marcescens.

Genomics, 113(6):3523-3532 pii:S0888-7543(21)00316-5 [Epub ahead of print].

Serratia marcescens is a global spread nosocomial pathogen. This rod-shaped bacterium displays a broad host range and worldwide geographical distribution. Here we analyze an international collection of this multidrug-resistant, opportunistic pathogen from 35 countries to infer its population structure. We show that S. marcescens comprises 12 lineages; Sm1, Sm4, and Sm10 harbor 78.3% of the known environmental strains. Sm5, Sm6, and Sm7 comprise only human-associated strains which harbor smallest pangenomes, genomic fluidity and lowest levels of core recombination, indicating niche specialization. Sm7 and Sm9 lineages exhibit the most concerning resistome; blaKPC-2 plasmid is widespread in Sm7, whereas Sm9, also an anthropogenic-exclusive lineage, presents highest plasmid/lineage size ratio and plasmid-diversity encoding metallo-beta-lactamases comprising blaNDM-1. The heterogeneity of resistance patterns of S. marcescens lineages elucidated herein highlights the relevance of surveillance programs, using whole-genome sequencing, to provide insights into the molecular epidemiology of carbapenemase producing strains of this species.

RevDate: 2021-09-10

Orsi WD, Magritsch T, Vargas S, et al (2021)

Genome Evolution in Bacteria Isolated from Million-Year-Old Subseafloor Sediment.

mBio, 12(4):e0115021.

Beneath the seafloor, microbial life subsists in isolation from the surface world under persistent energy limitation. The nature and extent of genomic evolution in subseafloor microbes have been unknown. Here, we show that the genomes of Thalassospira bacterial populations cultured from million-year-old subseafloor sediments evolve in clonal populations by point mutation, with a relatively low rate of homologous recombination and elevated numbers of pseudogenes. Ratios of nonsynonymous to synonymous substitutions correlate with the accumulation of pseudogenes, consistent with a role for genetic drift in the subseafloor strains but not in type strains of Thalassospira isolated from the surface world. Consistent with this, pangenome analysis reveals that the subseafloor bacterial genomes have a significantly lower number of singleton genes than the type strains, indicating a reduction in recent gene acquisitions. Numerous insertion-deletion events and pseudogenes were present in a flagellar operon of the subseafloor bacteria, indicating that motility is nonessential in these million-year-old subseafloor sediments. This genomic evolution in subseafloor clonal populations coincided with a phenotypic difference: all subseafloor isolates have a lower rate of growth under laboratory conditions than the Thalassospira xiamenensis type strain. Our findings demonstrate that the long-term physical isolation of Thalassospira, in the absence of recombination, has resulted in clonal populations whereby reduced access to novel genetic material from neighbors has resulted in the fixation of new mutations that accumulate in genomes over millions of years. IMPORTANCE The nature and extent of genomic evolution in subseafloor microbial populations subsisting for millions of years below the seafloor are unknown. Subseafloor populations have ultralow metabolic rates that are hypothesized to restrict reproduction and, consequently, the spread of new traits. Our findings demonstrate that genomes of cultivated bacterial strains from the genus Thalassospira isolated from million-year-old abyssal sediment exhibit greatly reduced levels of homologous recombination, elevated numbers of pseudogenes, and genome-wide evidence of relaxed purifying selection. These substitutions and pseudogenes are fixed into the population, suggesting that the genome evolution of these bacteria has been dominated by genetic drift. Thus, reduced recombination, stemming from long-term physical isolation, resulted in small clonal populations of Thalassospira that have accumulated mutations in their genomes over millions of years.

RevDate: 2021-09-21
CmpDate: 2021-09-21

Huang RR, Yang SR, Zhen C, et al (2021)

Genomic molecular signatures determined characterization of Mycolicibacterium gossypii sp. nov., a fast-growing mycobacterial species isolated from cotton field soil.

Antonie van Leeuwenhoek, 114(10):1735-1744.

A Gram-positive, acid-fast and rapidly growing rod, designated S2-37 T, that could form yellowish colonies was isolated from one soil sample collected from cotton cropping field located in the Xinjiang region of China. Genomic analyses indicated that strain S2-37 T harbored T7SS secretion system and was very likely able to produce mycolic acid, which were typical features of pathogenetic mycobacterial species. 16S rRNA-directed phylogenetic analysis referred that strain S2-37 T was closely related to bacterial species belonging to the genus Mycolicibacterium, which was further confirmed by pan-genome phylogenetic analysis. Digital DNA-DNA hybridization and the average nucleotide identity presented that strain S2-37 T displayed the highest values of 39.1% (35.7-42.6%) and 81.28% with M. litorale CGMCC 4.5724 T, respectively. And characterization of conserved molecular signatures further supported the taxonomic position of strain S2-37 T belonging to the genus Mycolicibacterium. The main fatty acids were identified as C16:0, C18:0, C20:3ω3 and C22:6ω3. In addition, polar lipids profile was mainly composed of diphosphatidylglycerol, phosphatidylethanolamine and phosphatidylinositol. Phylogenetic analyses, distinct fatty aids and antimicrobial resistance profiles indicated that strain S2-37 T represented genetically and phenotypically distinct from its closest phylogenetic neighbour, M. litorale CGMCC 4.5724 T. Here, we propose a novel species of the genus Mycolicibacterium: Mycolicibacterium gossypii sp. nov. with the type strain S2-37 T (= JCM 34327 T = CGMCC 1.18817 T).

RevDate: 2021-08-14

Saco A, Rey-Campos M, Rosani U, et al (2021)

The Evolution and Diversity of Interleukin-17 Highlight an Expansion in Marine Invertebrates and Its Conserved Role in Mucosal Immunity.

Frontiers in immunology, 12:692997.

The interleukin-17 (IL-17) family consists of proinflammatory cytokines conserved during evolution. A comparative genomics approach was applied to examine IL-17 throughout evolution from poriferans to higher vertebrates. Cnidaria was highlighted as the most ancient diverged phylum, and several evolutionary patterns were revealed. Large expansions of the IL-17 repertoire were observed in marine molluscs and echinoderm species. We further studied this expansion in filter-fed Mytilus galloprovincialis, which is a bivalve with a highly effective innate immune system supported by a variable pangenome. We recovered 379 unique IL-17 sequences and 96 receptors from individual genomes that were classified into 23 and 6 isoforms after phylogenetic analyses. Mussel IL-17 isoforms were conserved among individuals and shared between closely related Mytilidae species. Certain isoforms were specifically implicated in the response to a waterborne infection with Vibrio splendidus in mussel gills. The involvement of IL-17 in mucosal immune responses could be conserved in higher vertebrates from these ancestral lineages.

RevDate: 2021-09-01

Zhang X, Liu T, Wang J, et al (2021)

Pan-genome of Raphanus highlights genetic variation and introgression among domesticated, wild, and weedy radishes.

Molecular plant pii:S1674-2052(21)00318-X [Epub ahead of print].

Post-polyploid diploidization associated with descending dysploidy and interspecific introgression drives plant genome evolution by unclear mechanisms. Raphanus is an economically and ecologically important Brassiceae genus and model system for studying post-polyploidization genome evolution and introgression. Here, we report the de novo sequence assemblies for 11 genomes covering most of the typical sub-species and varieties of domesticated, wild and weedy radishes from East Asia, South Asia, Europe, and America. Divergence among the species, sub-species, and South/East Asian types coincided with Quaternary glaciations. A genus-level pan-genome was constructed with family-based, locus-based, and graph-based methods, and whole-genome comparisons revealed genetic variations ranging from single-nucleotide polymorphisms (SNPs) to inversions and translocations of whole ancestral karyotype (AK) blocks. Extensive gene flow occurred between wild, weedy, and domesticated radishes. High frequencies of genome reshuffling, biased retention, and large-fragment translocation have shaped the genomic diversity. Most variety-specific gene-rich blocks showed large structural variations. Extensive translocation and tandem duplication of dispensable genes were revealed in two large rearrangement-rich islands. Disease resistance genes mostly resided on specific and dispensable loci. Variations causing the loss of function of enzymes modulating gibberellin deactivation were identified and could play an important role in phenotype divergence and adaptive evolution. This study provides new insights into the genomic evolution underlying post-polyploid diploidization and lays the foundation for genetic improvement of radish crops, biological control of weeds, and protection of wild species' germplasms.

RevDate: 2021-09-10

Baker JL (2021)

Complete Genomes of Clade G6 Saccharibacteria Suggest a Divergent Ecological Niche and Lifestyle.

mSphere, 6(4):e0053021.

Saccharibacteria (formerly TM7) have reduced genomes and a small cell size and appear to have a parasitic lifestyle dependent on a bacterial host. Although there are at least 6 major clades of Saccharibacteria inhabiting the human oral cavity, complete genomes of oral Saccharibacteria were previously limited to the G1 clade. In this study, nanopore sequencing was used to obtain three complete genome sequences from clade G6. Phylogenetic analysis suggested the presence of at least 3 to 5 distinct species within G6, with two discrete taxa represented by the 3 complete genomes. G6 Saccharibacteria were highly divergent from the more-well-studied clade G1 and had the smallest genomes and lowest GC content of all Saccharibacteria. Pangenome analysis showed that although 97% of shared pan-Saccharibacteria core genes and 89% of G1-specific core genes had putative functions, only 50% of the 244 G6-specific core genes had putative functions, highlighting the novelty of this group. Compared to G1, G6 harbored divergent metabolic pathways. G6 genomes lacked an F1Fo ATPase, the pentose phosphate pathway, and several genes involved in nucleotide metabolism, which were all core genes for G1. G6 genomes were also unique compared to that of G1 in that they encoded d-lactate dehydrogenase, adenylate cyclase, limited glycerolipid metabolism, a homolog to a lipoarabinomannan biosynthesis enzyme, and the means to degrade starch. These differences at key metabolic steps suggest a distinct lifestyle and ecological niche for clade G6, possibly with alternative hosts and/or host dependencies, which would have significant ecological, evolutionary, and likely pathogenic implications. IMPORTANCE Saccharibacteria are ultrasmall parasitic bacteria that are common members of the oral microbiota and have been increasingly linked to disease and inflammation. However, the lifestyle and impact on human health of Saccharibacteria remain poorly understood, especially for the clades with no complete genomes (G2 to G6) or cultured isolates (G2 and G4 to G6). Obtaining complete genomes is of particular importance for Saccharibacteria, because they lack many of the "essential" core genes used for determining draft genome completeness, and few references exist outside clade G1. In this study, complete genomes of 3 G6 strains, representing two candidate species, were obtained and analyzed. The G6 genomes were highly divergent from that of G1 and enigmatic, with 50% of the G6 core genes having no putative functions. The significant difference in encoded functional pathways is suggestive of a distinct lifestyle and ecological niche, probably with alternative hosts and/or host dependencies, which would have major implications in ecology, evolution, and pathogenesis.

RevDate: 2021-09-10

Gómez-Sanz E, Haro-Moreno JM, Jensen SO, et al (2021)

The Resistome and Mobilome of Multidrug-Resistant Staphylococcus sciuri C2865 Unveil a Transferable Trimethoprim Resistance Gene, Designated dfrE, Spread Unnoticed.

mSystems, 6(4):e0051121.

Methicillin-resistant Staphylococcus sciuri (MRSS) strain C2865 from a stranded dog in Nigeria was trimethoprim (TMP) resistant but lacked formerly described staphylococcal TMP-resistant dihydrofolate reductase genes (dfr). Whole-genome sequencing, comparative genomics, and pan-genome analyses were pursued to unveil the molecular bases for TMP resistance via resistome and mobilome profiling. MRSS C2865 comprised a species subcluster and positioned just above the intraspecies boundary. Lack of species host tropism was observed. S. sciuri exhibited an open pan-genome, while MRSS C2865 harbored the highest number of unique genes (75% associated with mobilome). Within this fraction, we discovered a transferable TMP resistance gene, named dfrE, which confers high-level TMP resistance in Staphylococcus aureus and Escherichia coli. dfrE was located in a novel multidrug resistance mosaic plasmid (pUR2865-34) encompassing adaptive, mobilization, and segregational stability traits. dfrE was formerly denoted as dfr_like in Exiguobacterium spp. from fish farm sediment in China but escaped identification in one macrococcal and diverse staphylococcal genomes in different Asian countries. dfrE shares the highest identity with dfr of soil-related Paenibacillus anaericanus (68%). Data analysis discloses that dfrE has emerged from a single ancestor and places S. sciuri as a plausible donor. C2865 unique fraction additionally enclosed novel chromosomal mobile islands, including a multidrug-resistant pseudo-SCCmec cassette, three apparently functional prophages (Siphoviridae), and an SaPI4-related staphylococcal pathogenicity island. Since dfrE seems not yet common in staphylococcal clinical specimens, our data promote early surveillance and enable molecular diagnosis. We evidence the genome plasticity of S. sciuri and highlight its role as a resourceful reservoir for adaptive traits. IMPORTANCE The discovery and surveillance of antimicrobial resistance genes (AMRG) and their mobilization platforms are critical to understand the evolution of bacterial resistance and to restrain further expansion. Limited genomic data are available on Staphylococcus sciuri; regardless, it is considered a reservoir for critical AMRG and mobile elements. We uncover a transferable staphylococcal TMP resistance gene, named dfrE, in a novel mosaic plasmid harboring additional resistance, adaptive, and self-stabilization features. dfrE is present but evaded detection in diverse species from varied sources geographically distant. Our analyses evidence that the dfrE-carrying element has emerged from a single ancestor and position S. sciuri as the donor species for dfrE spread. We also identify novel mobilizable chromosomal islands encompassing AMRG and three unrelated prophages. We prove high intraspecies heterogenicity and genome plasticity for S. sciuri. This work highlights the importance of genome-wide ecological studies to facilitate identification, characterization, and evolution routes of bacteria adaptive features.

RevDate: 2021-09-13
CmpDate: 2021-09-13

Hily JM, Poulicard N, Kubina J, et al (2021)

Metagenomic analysis of nepoviruses: diversity, evolution and identification of a genome region in members of subgroup A that appears to be important for host range.

Archives of virology, 166(10):2789-2801.

Data mining and metagenomic analysis of 277 open reading frame sequences of bipartite RNA viruses of the genus Nepovirus, family Secoviridae, were performed, documenting how challenging it can be to unequivocally assign a virus to a particular species, especially those in subgroups A and C, based on some of the currently adopted taxonomic demarcation criteria. This work suggests a possible need for their amendment to accommodate pangenome information. In addition, we revealed a host-dependent structure of arabis mosaic virus (ArMV) populations at a cladistic level and confirmed a phylogeographic structure of grapevine fanleaf virus (GFLV) populations. We also identified new putative recombination events in members of subgroups A, B and C. The evolutionary specificity of some capsid regions of ArMV and GFLV that were described previously and biologically validated as determinants of nematode transmission was circumscribed in silico. Furthermore, a C-terminal segment of the RNA-dependent RNA polymerase of members of subgroup A was predicted to be a putative host range determinant based on statistically supported higher π (substitutions per site) values for GFLV and ArMV isolates infecting Vitis spp. compared with non-Vitis-infecting ArMV isolates. This study illustrates how sequence information obtained via high-throughput sequencing can increase our understanding of mechanisms that modulate virus diversity and evolution and create new opportunities for advancing studies on the biology of economically important plant viruses.

RevDate: 2021-08-09

Iqbal S, Vollmers J, HA Janjua (2021)

Genome Mining and Comparative Genome Analysis Revealed Niche-Specific Genome Expansion in Antibacterial Bacillus pumilus Strain SF-4.

Genes, 12(7):.

The present study reports the isolation of antibacterial exhibiting Bacillus pumilus (B. pumilus) SF-4 from soil field. The genome of this strain SF-4 was sequenced and analyzed to acquire in-depth genomic level insight related to functional diversity, evolutionary history, and biosynthetic potential. The genome of the strain SF-4 harbor 12 Biosynthetic Gene Clusters (BGCs) including four Non-ribosomal peptide synthetases (NRPSs), two terpenes, and one each of Type III polyketide synthases (PKSs), hybrid (NRPS/PKS), lipopeptide, β-lactone, and bacteriocin clusters. Plant growth-promoting genes associated with de-nitrification, iron acquisition, phosphate solubilization, and nitrogen metabolism were also observed in the genome. Furthermore, all the available complete genomes of B. pumilus strains were used to highlight species boundaries and diverse niche adaptation strategies. Phylogenetic analyses revealed local diversification and indicate that strain SF-4 is a sister group to SAFR-032 and 150a. Pan-genome analyses of 12 targeted strains showed regions of genome plasticity which regulate function of these strains and proposed direct strain adaptations to specific habitats. The unique genome pool carries genes mostly associated with "biosynthesis of secondary metabolites, transport, and catabolism" (Q), "replication, recombination and repair" (L), and "unknown function" (S) clusters of orthologous groups (COG) categories. Moreover, a total of 952 unique genes and 168 exclusively absent genes were prioritized across the 12 genomes. While newly sequenced B. pumilus SF-4 genome consists of 520 accessory, 59 unique, and seven exclusively absent genes. The current study demonstrates genomic differences among 12 B. pumilus strains and offers comprehensive knowledge of the respective genome architecture which may assist in the agronomic application of this strain in future.

RevDate: 2021-08-27

Surachat K, Deachamag P, Kantachote D, et al (2021)

In silico comparative genomics analysis of Lactiplantibacillus plantarum DW12, a potential gamma-aminobutyric acid (GABA)-producing strain.

Microbiological research, 251:126833.

Gamma-aminobutyric acid (GABA) is an amino that plays a major role as a neurotransmitter. It iscommonly produced by lactic acid bacteria (LAB) naturally found in fermented food and fruit. Lactiplantibacillus plantarum DW12 is a high potential GABA-producing strain isolated from a fermented beverage. In this study, to highlight its ability to produce GABA, we sequenced the genome of L. plantarum DW12 and then performed comprehensive bioinformatics and meta-analysis to compare the genomic data of previously published genomes. Also, the evolutionary analysis among L. plantarum species was demonstrated using pan-genome analysis against 576 genomes from the database. As a result, the DW12 genome comprises one circular chromosome of 3,217,574 bp. It contains several genes that encode for the production of antimicrobial compounds including plantaricin A, E, F, J, K, and N. The glutamic acid decarboxylase (GAD) operon was found in the DW12 genome, suggests a high potential of producing GABA in this strain. Therefore, L. plantarum DW12 could be a good candidate as a starter culture in the beverage and food industries due to its safety aspects and ability to produce GABA.

RevDate: 2021-09-17

Hufnagel B, Soriano A, Taylor J, et al (2021)

Pangenome of white lupin provides insights into the diversity of the species.

Plant biotechnology journal [Epub ahead of print].

White lupin is an old crop with renewed interest due to its seed high protein content and high nutritional value. Despite a long domestication history in the Mediterranean basin, modern breeding efforts have been fairly scarce. Recent sequencing of its genome has provided tools for further description of genetic resources but detailed characterization of genomic diversity is still missing. Here, we report the genome sequencing of 39 accessions that were used to establish a white lupin pangenome. We defined 32 068 core genes that are present in all individuals and 14 822 that are absent in some and may represent a gene pool for breeding for improved productivity, grain quality, and stress adaptation. We used this new pangenome resource to identify candidate genes for alkaloid synthesis, a key grain quality trait. The white lupin pangenome provides a novel genetic resource to better understand how domestication has shaped the genomic variability within this crop. Thus, this pangenome resource is an important step towards the effective and efficient genetic improvement of white lupin to help meet the rapidly growing demand for plant protein sources for human and animal consumption.

RevDate: 2021-08-06

Maarala AI, Arasalo O, Valenzuela D, et al (2021)

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.

PloS one, 16(8):e0255260.

Computational pan-genomics utilizes information from multiple individual genomes in large-scale comparative analysis. Genetic variation between case-controls, ethnic groups, or species can be discovered thoroughly using pan-genomes of such subpopulations. Whole-genome sequencing (WGS) data volumes are growing rapidly, making genomic data compression and indexing methods very important. Despite current space-efficient repetitive sequence compression and indexing methods, the deployed compression methods are often sequential, computationally time-consuming, and do not provide efficient sequence alignment performance on vast collections of genomes such as pan-genomes. For performing rapid analytics with the ever-growing genomics data, data compression and indexing methods have to exploit distributed and parallel computing more efficiently. Instead of strict genome data compression methods, we will focus on the efficient construction of a compressed index for pan-genomes. Compressed hybrid-index enables fast sequence alignments to several genomes at once while shrinking the index size significantly compared to traditional indexes. We propose a scalable distributed compressed hybrid-indexing method for large genomic data sets enabling pan-genome-based sequence search and read alignment capabilities. We show the scalability of our tool, DHPGIndex, by executing experiments in a distributed Apache Spark-based computing cluster comprising 448 cores distributed over 26 nodes. The experiments have been performed both with human and bacterial genomes. DHPGIndex built a BLAST index for n = 250 human pan-genome with an 870:1 compression ratio (CR) in 342 minutes and a Bowtie2 index with 157:1 CR in 397 minutes. For n = 1,000 human pan-genome, the BLAST index was built in 1520 minutes with 532:1 CR and the Bowtie2 index in 1938 minutes with 76:1 CR. Bowtie2 aligned 14.6 GB of paired-end reads to the compressed (n = 1,000) index in 31.7 minutes on a single node. Compressing n = 13,375,031 (488 GB) GenBank database to BLAST index resulted in CR of 62:1 in 575 minutes. BLASTing 189,864 Crispr-Cas9 gRNA target sequences (23 MB in total) to the compressed index of human pan-genome (n = 1,000) finished in 45 minutes on a single node. 30 MB mixed bacterial sequences were (n = 599) were blasted to the compressed index of 488 GB GenBank database (n = 13,375,031) in 26 minutes on 25 nodes. 78 MB mixed sequences (n = 4,167) were blasted to the compressed index of 18 GB E. coli sequence database (n = 745,409) in 5.4 minutes on a single node.

RevDate: 2021-08-03

Awan F, Ali MM, Dong Y, et al (2021)

In Silico Analysis of Potential Outer Membrane Beta-Barrel Proteins in Aeromonas hydrophila Pangenome.

International journal of peptide research and therapeutics [Epub ahead of print].

Outer membrane proteins (OMPs) of Aeromonas hydrophila have a variety of functional roles in virulence and pathogenesis and represent promising targets for vaccine development. The main objective of this study was to develop an in-silico model of beta-barrel OMP present among the valid A. hydrophila pangenomes (n = 22). With a program named the β-barrel Outer Membrane Protein Predictor (BOMP), total beta-barrel OMPs (n = 3127) were predicted across 22 genomes with the estimated median number of 64 per genome. In pangenome analysis, only 32 OMPs were found to be conserved. These beta-barrel OMPs also showed variations among source of isolation, COG and KEGG classes. Among 32 conserved OMPs, a highly antigenic protein was identified by utilizing Vaxijen. With B cell epitope predictions, two fragments of amino acid sequences i.e. GLTLGAQFTGNNDPQNADRSN (21 mer) and FKPSLAYLRTDVKDNARGI DDTATEY (26 mer) bearing B-cell binding sites were selected. Further, an epitope (12 amino acids: GLTLGAQFTGNN) that complexes to maximum MHC alleles with a higher antigenicity was determined. The analysis of evolutionary forces on the identified OMP sequence and epitope indicated that none of basic amino acid sites has shown significantly different substitution ratios. This conserved protein and epitope will be helpful in developing a vaccine that may be effective against all the A. hydrophila strains. Also, this study provides a theoretical basis for vaccine design against other pathogenic bacteria.

Supplementary Information: The online version contains supplementary material available at 10.1007/s10989-021-10259-z.

RevDate: 2021-07-30

Wang K, Hu H, Tian Y, et al (2021)

The chicken pan-genome reveals gene content variation and a promoter region deletion in IGF2BP1 affecting body size.

Molecular biology and evolution pii:6332014 [Epub ahead of print].

Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional ∼66.5 Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression level are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based GWAS identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size QTL located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.

RevDate: 2021-08-19

Hu H, Scheben A, Verpaalen B, et al (2021)

Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation.

RevDate: 2021-08-01

Davidson RM, Benoit JB, Kammlade SM, et al (2021)

Genomic characterization of sporadic isolates of the dominant clone of Mycobacterium abscessus subspecies massiliense.

Scientific reports, 11(1):15336.

Recent studies have characterized a dominant clone (Clone 1) of Mycobacterium abscessus subspecies massiliense (M. massiliense) associated with high prevalence in cystic fibrosis (CF) patients, pulmonary outbreaks in the United States (US) and United Kingdom (UK), and a Brazilian epidemic of skin infections. The prevalence of Clone 1 in non-CF patients in the US and the relationship of sporadic US isolates to outbreak clones are not known. We surveyed a reference US Mycobacteria Laboratory and a US biorepository of CF-associated Mycobacteria isolates for Clone 1. We then compared genomic variation and antimicrobial resistance (AMR) mutations between sporadic non-CF, CF, and outbreak Clone 1 isolates. Among reference lab samples, 57/147 (39%) of patients with M. massiliense had Clone 1, including pulmonary and extrapulmonary infections, compared to 11/64 (17%) in the CF isolate biorepository. Core and pan genome analyses revealed that outbreak isolates had similar numbers of single nucleotide polymorphisms (SNPs) and accessory genes as sporadic US Clone 1 isolates. However, pulmonary outbreak isolates were more likely to have AMR mutations compared to sporadic isolates. Clone 1 isolates are present among non-CF and CF patients across the US, but additional studies will be needed to resolve potential routes of transmission and spread.

RevDate: 2021-08-25

Bayer PE, Scheben A, Golicz AA, et al (2021)

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids.

Plant biotechnology journal [Epub ahead of print].

Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.

RevDate: 2021-09-15

Hernández-Juárez LE, Camorlinga M, Méndez-Tenorio A, et al (2021)

Analyses of publicly available Hungatella hathewayi genomes revealed genetic distances indicating they belong to more than one species.

Virulence, 12(1):1950-1964.

Hungatella hathewayi has been observed to be a member of the gut microbiome. Unfortunately, little is known about this organism in spite of being associated with human fatalities; it is important to understand virulence mechanisms and epidemiological prospective to cause disease. In this study, a patient with chronic neurologic symptoms presented to the clinic with subsequent isolation of a strain with phenotypic characteristics suggestive of Clostridium difficile. However, whole-genome sequence found the organism to be H. hathewayi. Analysis including publicly available Hungatella genomes found substantial genomic differences as compared to the type strain, indicating this isolate was not C. difficile. We examined the whole-genome of Hungatella species and related genera, using comparative genomics to fully examine species identification and toxin production. Orthogonal phylogenetic using the 16S rRNA gene and entire genome analyses that included genome distance analyses using Genome-to-Genome Distance (GGDC), Average Nucleotide Identity (ANI), and a pan-genome analysis with inclusion of available public genomes determined the speciation to be Hungatella. Two clearly differentiated groups were identified, one including a reference H. hathewayi genome (strain DSM-13,479) and a second group that was determined to be H. effluvii, which included our clinical isolate. Also, some genomes reported as H. hathewayi were found to belong to other genera, including Clostridium and Faecalicatena. We show that the Hungatella species have an open pan-genome reflecting high genomic diversity. This study highlights the importance of correctly assigning taxonomic identification, particularly in disease-associated strains, to better understand virulence and therapeutic options.

RevDate: 2021-07-23

Liu Z, Zhao Y, Sossah FL, et al (2021)

Characterization, Pathogenicity, Phylogeny, and Comparative Genomic Analysis of Pseudomonas tolaasii Strains Isolated from Various Mushrooms in China.

Phytopathology [Epub ahead of print].

Since 2016, devastating bacterial blotch affecting the fruiting bodies of Agaricus bisporus, Cordyceps militaris, Flammulina filiformis, and Pleurotus ostreatus in China has caused severe economic losses. We isolated 102 bacterial strains and characterized them polyphasically. We identified the causal agent as Pseudomonas tolaasii and confirmed the pathogenicity of the strains. A host range test further confirmed the pathogen's ability to infect multiple hosts. This is the first report in China of bacterial blotch in C. militaris caused by P. tolaasii. Whole-genome sequences were generated for three strains: Pt11 (6.48 Mb), Pt51 (6.63 Mb), and Pt53 (6.80 Mb), and pangenome analysis was performed with 13 other publicly accessible P. tolaasii genomes to determine their genetic diversity, virulence, antibiotic resistance, and mobile genetic elements. The pangenome of P. tolaasii is open, and many more gene families are likely to emerge with further genome sequencing. Multilocus sequence analysis using the sequences of four common housekeeping genes (glns, gyrB, rpoB, and rpoD) showed high genetic variability among the P. tolaasii strains, with 115 strains clustered into a monophyletic group. The P. tolaasii strains possess various genes for secretion systems, virulence factors, carbohydrate-active enzymes, toxins, secondary metabolites, and antimicrobial resistance genes that are associated with pathogenesis and adapted to different environments. The myriad of insertion sequences, integrons, prophages, and genome islands encoded in the strains may contribute to genome plasticity, virulence, and antibiotic resistance. These findings advance understanding of the determinants of virulence, which can be targeted for the effective control of bacterial blotch disease.

RevDate: 2021-07-21

Bayer PE, Petereit J, Danilevicz MF, et al (2021)

The application of pangenomics and machine learning in genomic selection in plants.

The plant genome [Epub ahead of print].

Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.

RevDate: 2021-09-07

Fiedoruk K, Drewnowska JM, Mahillon J, et al (2021)

Pan-Genome Portrait of Bacillus mycoides Provides Insights into the Species Ecology and Evolution.

Microbiology spectrum, 9(1):e0031121.

Bacillus mycoides is poorly known despite its frequent occurrence in a wide variety of environments. To provide direct insight into its ecology and evolutionary history, a comparative investigation of the species pan-genome and the functional gene categorization of 35 isolates obtained from soil samples from northeastern Poland was performed. The pan-genome of these isolates is composed of 20,175 genes and is characterized by a strong predominance of adaptive genes (∼83%), a significant amount of plasmid genes (∼37%), and a great contribution of prophages and insertion sequences. The pan-genome structure and phylodynamic studies had suggested a wide genomic diversity among the isolates, but no correlation between lineages and the bacillus origin was found. Nevertheless, the two B. mycoides populations, one from Białowieża National Park, the last European natural primeval forest with soil classified as organic, and the second from mineral soil samples taken in a farm in Jasienówka, a place with strong anthropogenic pressure, differ significantly in the frequency of genes encoding proteins enabling bacillus adaptation to specific stress conditions and production of a set of compounds, thus facilitating their colonization of various ecological niches. Furthermore, differences in the prevalence of essential stress sigma factors might be an important trail of this process. Due to these numerous adaptive genes, B. mycoides is able to quickly adapt to changing environmental conditions. IMPORTANCE This research allows deeper understanding of the genetic organization of natural bacterial populations, specifically, Bacillus mycoides, a psychrotrophic member of the Bacillus cereus group that is widely distributed worldwide, especially in areas with continental cold climates. These thorough analyses made it possible to describe, for the first time, the B. mycoides pan-genome, phylogenetic relationship within this species, and the mechanisms behind the species ecology and evolutionary history. Our study indicates a set of functional properties and adaptive genes, in particular, those encoding sigma factors, associated with B. mycoides acclimatization to specific ecological niches and changing environmental conditions.

RevDate: 2021-07-24
CmpDate: 2021-07-22

Steidele CE, R Stam (2021)

Multi-omics approach highlights differences between RLP classes in Arabidopsis thaliana.

BMC genomics, 22(1):557.

BACKGROUND: The Leucine rich-repeat (LRR) receptor-like protein (RLP) family is a complex gene family with 57 members in Arabidopsis thaliana. Some members of the RLP family are known to be involved in basal developmental processes, whereas others are involved in defence responses. However, functional data is currently only available for a small subset of RLPs, leaving the remaining ones classified as RLPs of unknown function.

RESULTS: Using publicly available datasets, we annotated RLPs of unknown function as either likely defence-related or likely fulfilling a more basal function in plants. Then, using these categories, we can identify important characteristics that differ between the RLP subclasses. We found that the two classes differ in abundance on both transcriptome and proteome level, physical clustering in the genome and putative interaction partners. However, the classes do not differ in the genetic di versity of their individual members in accessible pan-genome data.

CONCLUSIONS: Our work has several implications for work related to functional studies on RLPs as well as for the understanding of RLP gene family evolution. Using our annotations, we can make suggestions on which RLPs can be identified as potential immune receptors using genetics tools and thereby complement disease studies. The lack of differences in nucleotide diversity between the two RLP subclasses further suggests that non-synonymous diversity of gene sequences alone cannot distinguish defence from developmental genes. By contrast, differences in transcript and protein abundance or clustering at genomic loci might also allow for functional annotations and characterisation in other plant species.

RevDate: 2021-08-27

Wu JJ, Chou HP, Huang JW, et al (2021)

Genomic and biochemical characterization of antifungal compounds produced by Bacillus subtilis PMB102 against Alternaria brassicicola.

Microbiological research, 251:126815.

Bacillus subtilis is ubiquitous and capable of producing various metabolites, which make the bacterium a good candidate as a biocontrol agent for managing plant diseases. In this study, a phyllosphere bacterium B. subtilis PMB102 isolated from tomato leaf was found to inhibit the growth of Alternaria brassicicola ABA-31 on PDA and suppress Alternaria leaf spot on Chinese cabbage (Brassica rapa). The genome of PMB102 (Accession no. CP047645) was completely sequenced by Nanopore and Illumina technology to generate a circular chromosome of 4,103,088 bp encoding several gene clusters for synthesizing bioactive compounds. PMB102 and the other B. subtilis strains from different sources were compared in pangenome analysis to identify a suite of conserved genes involved in biocontrol and habitat adaptation. Two predicted gene products, surfactin and fengycin, were extracted from PMB102 culture filtrates and verified by LC-MS/MS. The antifungal activity of fengycin was tested on A. brassicicola ABA-31 in bioautography to inhibit hyphae growth, and in co-culturing assays to elicit the formation of swollen hyphae. Our data revealed that B. subtilis PMB102 suppresses Alternaria leaf spot by the production of antifungal metabolites, and fengycin plays an important role to inhibit the vegetative growth of A. brassicicola ABA-31.

RevDate: 2021-08-06
CmpDate: 2021-08-06

Branford I, Johnson S, Chapwanya A, et al (2021)

Comprehensive Molecular Dissection of Dermatophilus congolensis Genome and First Observation of tet(Z) Tetracycline Resistance.

International journal of molecular sciences, 22(13):.

Dermatophilus congolensis is a bacterial pathogen mostly of ruminant livestock in the tropics/subtropics and certain temperate climate areas. It causes dermatophilosis, a skin disease that threatens food security by lowering animal productivity and compromising animal health and welfare. Since it is a prevalent infection in ruminants, dermatophilosis warrants more research. There is limited understanding of its pathogenicity, and as such, there is no registered vaccine against D. congolensis. To better understanding the genomics of D. congolensis, the primary aim of this work was to investigate this bacterium using whole-genome sequencing and bioinformatic analysis. D. congolensis is a high GC member of the Actinobacteria and encodes approximately 2527 genes. It has an open pan-genome, contains many potential virulence factors, secondary metabolites and encodes at least 23 housekeeping genes associated with antimicrobial susceptibility mechanisms and some isolates have an acquired antimicrobial resistance gene. Our isolates contain a single CRISPR array Cas type IE with classical 8 Cas genes. Although the isolates originate from the same geographical location there is some genomic diversity among them. In conclusion, we present the first detailed genomic study on D. congolensis, including the first observation of tet(Z), a tetracycline resistance-conferring gene.

RevDate: 2021-09-08

Basharat Z, Jahanzaib M, N Rahman (2021)

Therapeutic target identification via differential genome analysis of antibiotic resistant Shigella sonnei and inhibitor evaluation against a selected drug target.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases, 94:105004.

Shigella sonnei has been implicated in bloody diarrhea (accompanied by abdominal pain and fever) and is an emerging pathogen of concern, especially in developing countries. The major means of transmission is the fecal-oral route while sexual transmission has also been reported. In children, the impact might be stunted growth due to life-threatening illness. Resistance has been reported in this species for several types of antibiotics. In this study, we retrieved the antibiotic-resistant labeled whole genome sequences of the species from the PATRIC database and performed a pan-genome analysis to filter out core genes. Antibiotic resistance was studied in the core, accessory and unique genome. Core genes were utilized as seed substance for essentiality analysis and drug candidate assignment. Product of the gene aroG, i.e. chorismate biosynthetic process 3-deoxy-7-phosphoheptulonate synthase enzyme, responsible for aromatic amino acid family biosynthetic process, was taken for further downstream processing. Natural product libraries of flavonoids (n = 178), ZINC database derived inhibitor compounds of the 3-deoxy-7-phosphoheptulonate synthase enzyme (n = 112), and streptomycin compounds (n = 737) were docked to find out potent inhibitors, followed by dynamics simulation of 50 ns each for top compounds.. Physicochemical and ADMET profiling of the top compounds was done to analyze their safety for consumption. We propose that the top compounds: Phytoene from Streptomycin library and ZINC000036444158 (synonym:1,16-bis[(dihydroxyphosphinyl)oxy]hexadecane) from 3-deoxy-7-phosphoheptulonate synthase inhibitor library of ZINC database (and used as a control in this study) should be tested in vitro against Shigella sonnei, to fully determine their efficacy. This could add to the drying pipeline of potent drug molecules against emerging pathogens.

RevDate: 2021-07-18

Bornowski N, Michel KJ, Hamilton JP, et al (2021)

Genomic variation within the maize stiff-stalk heterotic germplasm pool.

The plant genome [Epub ahead of print].

The stiff-stalk heterotic group in Maize (Zea mays L.) is an important source of inbreds used in U.S. commercial hybrid production. Founder inbreds B14, B37, B73, and, to a lesser extent, B84, are found in the pedigrees of a majority of commercial seed parent inbred lines. We created high-quality genome assemblies of B84 and four expired Plant Variety Protection (ex-PVP) lines LH145 representing B14, NKH8431 of mixed descent, PHB47 representing B37, and PHJ40, which is a Pioneer Hi-Bred International (PHI) early stiff-stalk type. Sequence was generated using long-read sequencing achieving highly contiguous assemblies of 2.13-2.18 Gbp with N50 scaffold lengths >200 Mbp. Inbred-specific gene annotations were generated using a core five-tissue gene expression atlas, whereas transposable element (TE) annotation was conducted using de novo and homology-directed methodologies. Compared with the reference inbred B73, synteny analyses revealed extensive collinearity across the five stiff-stalk genomes, although unique components of the maize pangenome were detected. Comparison of this set of stiff-stalk inbreds with the original Iowa Stiff Stalk Synthetic breeding population revealed that these inbreds represent only a proportion of variation in the original stiff-stalk pool and there are highly conserved haplotypes in released public and ex-Plant Variety Protection inbreds. Despite the reduction in variation from the original stiff-stalk population, substantial genetic and genomic variation was identified supporting the potential for continued breeding success in this pool. The assemblies described here represent stiff-stalk inbreds that have historical and commercial relevance and provide further insight into the emerging maize pangenome.

RevDate: 2021-07-16

Verma DK, Chaudhary C, Singh L, et al (2021)

Corrigendum: Isolation and Taxonomic Characterization of Novel Haloarchaeal Isolates From Indian Solar Saltern: A Brief Review on Distribution of Bacteriorhodopsins and V-Type ATPases in Haloarchaea.

Frontiers in microbiology, 12:713942.

[This corrects the article DOI: 10.3389/fmicb.2020.554927.].

RevDate: 2021-09-16
CmpDate: 2021-09-16

Liao J, Guo X, Weller DL, et al (2021)

Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution.

Nature microbiology, 6(8):1021-1030.

Natural bacterial populations can display enormous genomic diversity, primarily in the form of gene content variation caused by the frequent exchange of DNA with the local environment. However, the ecological drivers of genomic variability and the role of selection remain controversial. Here, we address this gap by developing a nationwide atlas of 1,854 Listeria isolates, collected systematically from soils across the contiguous United States. We found that Listeria was present across a wide range of environmental parameters, being mainly controlled by soil moisture, molybdenum and salinity concentrations. Whole-genome data from 594 representative strains allowed us to decompose Listeria diversity into 12 phylogroups, each with large differences in habitat breadth and endemism. 'Cosmopolitan' phylogroups, prevalent across many different habitats, had more open pangenomes and displayed weaker linkage disequilibrium, reflecting higher rates of gene gain and loss, and allele exchange than phylogroups with narrow habitat ranges. Cosmopolitan phylogroups also had a large fraction of genes affected by positive selection. The effect of positive selection was more pronounced in the phylogroup-specific core genome, suggesting that lineage-specific core genes are important drivers of adaptation. These results indicate that genome flexibility and recombination are the consequence of selection to survive in variable environments.

RevDate: 2021-07-14

Norri T, Cazaux B, Dönges S, et al (2021)

Founder Reconstruction Enables Scalable and Seamless Pangenomic Analysis.

Bioinformatics (Oxford, England) pii:6321452 [Epub ahead of print].

MOTIVATION: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge.

RESULTS: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.

AVAILABILITY: Our open access tools and instructions how to reproduce our experiments are available at the following address: https://github.com/algbio/panvc-founders.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-07-22
CmpDate: 2021-07-21

Lu TY, Human Genome Structural Variation Consortium, MJP Chaisson (2021)

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.

Nature communications, 12(1):4250.

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

RevDate: 2021-08-10
CmpDate: 2021-07-15

Jain C, Tavakoli N, S Aluru (2021)

A variant selection framework for genome graphs.

Bioinformatics (Oxford, England), 37(Suppl_1):i460-i467.

MOTIVATION: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.

RESULTS: In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.

: https://github.com/AT-CG/VF.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-08-02

Pedrós-Alió C (2021)

Time travel in microorganisms.

Systematic and applied microbiology, 44(4):126227.

RevDate: 2021-08-08

Nie S, Wang B, Ding H, et al (2021)

Genome assembly of the Chinese maize elite inbred line RP125 and its EMS mutant collection provide new resources for maize genetics research and crop improvement.

The Plant journal : for cell and molecular biology [Epub ahead of print].

Maize is an important crop worldwide, as well as a valuable model with vast genetic diversity. Accurate genome and annotation information for a wide range of inbred lines would provide valuable resources for crop improvement and pan-genome characterization. In this study, we generated a high-quality de novo genome assembly (contig N50 of 15.43 Mb) of the Chinese elite inbred line RP125 using Nanopore long-read sequencing and Hi-C scaffolding, which yield highly contiguous, chromosome-length scaffolds. Global comparison of the RP125 genome with those of B73, W22, and Mo17 revealed a large number of structural variations. To create new germplasm for maize research and crop improvement, we carried out an EMS mutagenesis screen on RP125. In total, we obtained 5818 independent M2 families, with 946 mutants showing heritable phenotypes. Taking advantage of the high-quality RP125 genome, we successfully cloned 10 mutants from the EMS library, including the novel kernel mutant qk1 (quekou: "missing a small part" in Chinese), which exhibited partial loss of endosperm and a starch accumulation defect. QK1 encodes a predicted metal tolerance protein, which is specifically required for Fe transport. Increased accumulation of Fe and reactive oxygen species as well as ferroptosis-like cell death were detected in qk1 endosperm. Our study provides the community with a high-quality genome sequence and a large collection of mutant germplasm.

RevDate: 2021-07-28

Noroy C, DF Meyer (2021)

The super repertoire of type IV effectors in the pangenome of Ehrlichia spp. provides insights into host-specificity and pathogenesis.

PLoS computational biology, 17(7):e1008788.

The identification of bacterial effectors is essential to understand how obligatory intracellular bacteria such as Ehrlichia spp. manipulate the host cell for survival and replication. Infection of mammals-including humans-by the intracellular pathogenic bacteria Ehrlichia spp. depends largely on the injection of virulence proteins that hijack host cell processes. Several hypothetical virulence proteins have been identified in Ehrlichia spp., but one so far has been experimentally shown to translocate into host cells via the type IV secretion system. However, the current challenge is to identify most of the type IV effectors (T4Es) to fully understand their role in Ehrlichia spp. virulence and host adaptation. Here, we predict the T4E repertoires of four sequenced Ehrlichia spp. and four other Anaplasmataceae as comparative models (pathogenic Anaplasma spp. and Wolbachia endosymbiont) using previously developed S4TE 2.0 software. This analysis identified 579 predicted T4Es (228 pT4Es for Ehrlichia spp. only). The effector repertoires of Ehrlichia spp. overlapped, thereby defining a conserved core effectome of 92 predicted effectors shared by all strains. In addition, 69 species-specific T4Es were predicted with non-canonical GC% mostly in gene sparse regions of the genomes and we observed a bias in pT4Es according to host-specificity. We also identified new protein domain combinations, suggesting novel effector functions. This work presenting the predicted effector collection of Ehrlichia spp. can serve as a guide for future functional characterisation of effectors and design of alternative control strategies against these bacteria.

RevDate: 2021-07-13

Cao H, Xu H, Ning C, et al (2021)

Multi-Omics Approach Reveals the Potential Core Vaccine Targets for the Emerging Foodborne Pathogen Campylobacter jejuni.

Frontiers in microbiology, 12:665858.

Campylobacter jejuni is a leading cause of bacterial gastroenteritis in humans around the world. The emergence of bacterial resistance is becoming more serious; therefore, development of new vaccines is considered to be an alternative strategy against drug-resistant pathogen. In this study, we investigated the pangenome of 173 C. jejuni strains and analyzed the phylogenesis and the virulence factor genes. In order to acquire a high-quality pangenome, genomic relatedness was firstly performed with average nucleotide identity (ANI) analyses, and an open pangenome of 8,041 gene families was obtained with the correct taxonomy genomes. Subsequently, the virulence property of the core genome was analyzed and 145 core virulence factor (VF) genes were obtained. Upon functional genomics and immunological analyses, five core VF proteins with high antigenicity were selected as potential core vaccine targets for humans. Furthermore, functional annotations indicated that these proteins are involved in important molecular functions and biological processes, such as adhesion, regulation, and secretion. In addition, transcriptome analysis in human cells and pig intestinal loop proved that these vaccine target genes are important in the virulence of C. jejuni in different hosts. Comprehensive pangenome and relevant animal experiments will facilitate discovering the potential core vaccine targets with improved efficiency in reverse vaccinology. Likewise, this study provided some insights into the genetic polymorphism and phylogeny of C. jejuni and discovered potential vaccine candidates for humans. Prospective development of new vaccines using the targets will be an alternative to the use of antibiotics and prevent the development of multidrug-resistant C. jejuni in humans and even other animals.

RevDate: 2021-07-13

Banerjee R, Chaudhari NM, Lahiri A, et al (2021)

Interplay of Various Evolutionary Modes in Genome Diversification and Adaptive Evolution of the Family Sulfolobaceae.

Frontiers in microbiology, 12:639995.

Sulfolobaceae family, comprising diverse thermoacidophilic and aerobic sulfur-metabolizing Archaea from various geographical locations, offers an ideal opportunity to infer the evolutionary dynamics across the members of this family. Comparative pan-genomics coupled with evolutionary analyses has revealed asymmetric genome evolution within the Sulfolobaceae family. The trend of genome streamlining followed by periods of differential gene gains resulted in an overall genome expansion in some species of this family, whereas there was reduction in others. Among the core genes, both Sulfolobus islandicus and Saccharolobus solfataricus showed a considerable fraction of positively selected genes and also higher frequencies of gene acquisition. In contrast, Sulfolobus acidocaldarius genomes experienced substantial amount of gene loss and strong purifying selection as manifested by relatively lower genome size and higher genome conservation. Central carbohydrate metabolism and sulfur metabolism coevolved with the genome diversification pattern of this archaeal family. The autotrophic CO2 fixation with three significant positively selected enzymes from S. islandicus and S. solfataricus was found to be more imperative than heterotrophic CO2 fixation for Sulfolobaceae. Overall, our analysis provides an insight into the interplay of various genomic adaptation strategies including gene gain-loss, mutation, and selection influencing genome diversification of Sulfolobaceae at various taxonomic levels and geographical locations.

RevDate: 2021-09-24
CmpDate: 2021-09-24

Begrem S, Jérôme M, Leroi F, et al (2021)

Genomic diversity of Serratia proteamaculans and Serratia liquefaciens predominant in seafood products and spoilage potential analyses.

International journal of food microbiology, 354:109326.

Serratia sp. cause food losses and waste due to spoilage; it is noteworthy that they represent a dominant population in seafood. The main spoilage associated species comprise S. liquefaciens, S. grimesii, S. proteamaculans and S. quinivorans, also known as S. liquefaciens-like strains. These species are difficult to discriminate since classical 16S rRNA gene-based sequences do not possess sufficient resolution. In this study, a phylogeny based on the short-length luxS gene was able to speciate 47 Serratia isolates from seafood, with S. proteamaculans being the main species from fresh salmon and tuna, cold-smoked salmon, and cooked shrimp while S. liquefaciens was only found in cold-smoked salmon. The genome of the first S. proteamaculans strain isolated from the seafood matrix (CD3406 strain) was sequenced. Pangenome analyses of S. proteamaculans and S. liquefaciens indicated high adaptation potential. Biosynthetic pathways involved in antimicrobial compounds production and in the main seafood spoilage compounds were also identified. The genetic equipment highlighted in this study contributed to gain further insights into the predominance of Serratia in seafood products and their capacity to spoil.

RevDate: 2021-08-01

Wang S, Narsing Rao MP, Wei D, et al (2021)

Complete genome sequencing and comparative genome analysis of the extremely halophilic archaea, Haloterrigena daqingensis.

Biotechnology and applied biochemistry [Epub ahead of print].

In the present study, we report the complete genome sequencing of Haloterrigena daqingensis species. The genome of H. daqingensis JX313T consisted of a circular chromosome with three plasmids. The genome size and G+C content were estimated to be 3835796 bp and 61.7%, respectively. A total of 4158 genes were predicted with six rRNAs and 45 tRNAs. Metabolic pathway analysis suggests that H. daqingensis JX313T codes for all the necessary genes responsible to sustain its life at saline environment. The pan-genome analysis suggests that the number of singleton-gene between H. daqingensis and other Haloterrigena species varied. The study not only helps us understand H. daqingensis strategy for dealing with high stress, but it also provides an overview of its genomic makeup.

RevDate: 2021-07-12

Sanoussi CN, Coscolla M, Ofori-Anyinam B, et al (2021)

Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain H37Rv.

Microbial genomics, 7(7):.

Pathogens of the Mycobacterium tuberculosis complex (MTBC) are considered to be monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate strains of the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum) strains, from each other. However, this genome variability and gene content, especially of L5 strains, has not been fully explored and may be important for pathobiology and current approaches for genomic analysis of MTBC strains, including transmission studies. By comparing the genomes of 355 L5 clinical strains (including 3 complete genomes and 352 Illumina whole-genome sequenced isolates) to each other and to H37Rv, we identified multiple genes that were differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sub-lineage into L5.3.1 and L5.3.2. These gene content differences had a small knock-on effect on transmission cluster estimation, with clustering rates influenced by the selected reference genome, and with potential overestimation of recent transmission when using H37Rv as the reference genome. We conclude that full capture of the gene diversity, especially high-resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most whole-genome sequencing data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the pan-genome of M. tuberculosis is at least several kilobases larger than previously thought, implying that a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions.

RevDate: 2021-07-12
CmpDate: 2021-07-12

Sinha D, Sun X, Khare M, et al (2021)

Pangenome analysis and virulence profiling of Streptococcus intermedius.

BMC genomics, 22(1):522.

BACKGROUND: Streptococcus intermedius, a member of the S. anginosus group, is a commensal bacterium present in the normal microbiota of human mucosal surfaces of the oral, gastrointestinal, and urogenital tracts. However, it has been associated with various infections such as liver and brain abscesses, bacteremia, osteo-articular infections, and endocarditis. Since 2005, high throughput genome sequencing methods enabled understanding the genetic landscape and diversity of bacteria as well as their pathogenic role. Here, in order to determine whether specific virulence genes could be related to specific clinical manifestations, we compared the genomes from 27 S. intermedius strains isolated from patients with various types of infections, including 13 that were sequenced in our institute and 14 available in GenBank.

RESULTS: We estimated the theoretical pangenome size to be of 4,020 genes, including 1,355 core genes, 1,054 strain-specific genes and 1,611 accessory genes shared by 2 or more strains. The pangenome analysis demonstrated that the genomic diversity of S. intermedius represents an "open" pangenome model. We identified a core virulome of 70 genes and 78 unique virulence markers. The phylogenetic clusters based upon core-genome sequences and SNPs were independent from disease types and sample sources. However, using Principal Component analysis based on presence/ absence of virulence genes, we identified the sda histidine kinase, adhesion protein LAP and capsular polysaccharide biosynthesis protein cps4E as being associated to brain abscess or broncho-pulmonary infection. In contrast, liver and abdominal abscess were associated to presence of the fibronectin binding protein fbp54 and capsular polysaccharide biosynthesis protein cap8D and cpsB.

CONCLUSIONS: Based on the virulence gene content of 27 S. intermedius strains causing various diseases, we identified putative disease-specific genetic profiles discriminating those causing brain abscess or broncho-pulmonary infection from those causing liver and abdominal abscess. These results provide an insight into S. intermedius pathogenesis and highlights putative targets in a diagnostic perspective.

RevDate: 2021-08-30

Liu C, Peng P, Li W, et al (2021)

Deciphering variation of 239 elite japonica rice genomes for whole genome sequences-enabled breeding.

Genomics, 113(5):3083-3091.

Revealing genomic variation of representative and diverse germplasm is the cornerstone of deploying genomics information into genetic improvement programs of species of agricultural importance. Here we report the re-sequencing of 239 japonica rice elites representing the genetic diversity of japonica germplasm in China, Japan and Korea. A total of 4.8 million SNPs and PAV of 35,634 genes were identified. The elites from Japan and Korea are closely related and relatively less diverse than those from China. A japonica rice pan-genome was constructed, and 35 Mb non-redundant novel sequences were identified, from which 1131 novel genes were predicted. Strong selection signals of genomic regions were detected on most of the chromosomes. The heading date genes Hd1 and Hd3a have been artificially selected during the breeding process. The results from this study lay the foundation for future whole genome sequences-enabled breeding in rice and provide a paradigm for other species.

RevDate: 2021-07-06

Rijzaani H, Bayer PE, Rouard M, et al (2021)

The pangenome of banana highlights differences between genera and genomes.

The plant genome [Epub ahead of print].

Banana (Musaceae family) has a complex genetic history and includes a genus Musa with a variety of cultivated clones with edible fruits, Ensete species that are grown for their edible corm, and monospecific Musella whose generic status has been questioned. The most commonly exported banana cultivars belong to Cavendish, a subgroup of Musa triploid cultivars, which is under threat by fungal pathogens, though there are also related species M. balbisiana Colla (B genome), M. textilis Née (T genome), and M. schizocarpa N. W. Simmonds (S genome), along with hybrids of these genomes, which potentially host genes of agronomic interest. Here we present the first cross-genus pangenome of banana, which contains representatives of the Musa and Ensete genera. Clusters based on gene presence-absence variation (PAV) clearly separate Musa and Ensete, while Musa is split further based on species. These results present the first pangenome study across genus boundaries and identifies genes that differentiate between Musaceae species, information that may support breeding programs in these crops.

RevDate: 2021-07-26
CmpDate: 2021-07-26

Lovell JT, Bentley NB, Bhattarai G, et al (2021)

Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding.

Nature communications, 12(1):4125.

Genome-enabled biotechnologies have the potential to accelerate breeding efforts in long-lived perennial crop species. Despite the transformative potential of molecular tools in pecan and other outcrossing tree species, highly heterozygous genomes, significant presence-absence gene content variation, and histories of interspecific hybridization have constrained breeding efforts. To overcome these challenges, here, we present diploid genome assemblies and annotations of four outbred pecan genotypes, including a PacBio HiFi chromosome-scale assembly of both haplotypes of the 'Pawnee' cultivar. Comparative analysis and pan-genome integration reveal substantial and likely adaptive interspecific genomic introgressions, including an over-retained haplotype introgressed from bitternut hickory into pecan breeding pedigrees. Further, by leveraging our pan-genome presence-absence and functional annotation database among genomes and within the two outbred haplotypes of the 'Lakota' genome, we identify candidate genes for pest and pathogen resistance. Combined, these analyses and resources highlight significant progress towards functional and quantitative genomics in highly diverse and outbred crops.

RevDate: 2021-07-06

Hendrickx APA, Debast S, Pérez-Vázquez M, et al (2021)

A genetic cluster of MDR Enterobacter cloacae complex ST78 harbouring a plasmid containing bla VIM-1 and mcr-9 in the Netherlands.

JAC-antimicrobial resistance, 3(2):dlab046.

Background: Carbapenemases produced by Enterobacterales are often encoded by genes on transferable plasmids and represent a major healthcare problem, especially if the plasmids contain additional antibiotic resistance genes. As part of Dutch national surveillance, 50 medical microbiological laboratories submit their Enterobacterales isolates suspected of carbapenemase production to the National Institute for Public Health and the Environment for characterization. All isolates for which carbapenemase production is confirmed are subjected to next-generation sequencing.

Objectives: To study the molecular characteristics of a genetic cluster of Enterobacter cloacae complex isolates collected in Dutch national surveillance in the period 2015-20 in the Netherlands.

Methods: Short- and long-read genome sequencing was used in combination with MLST and pan-genome MLST (pgMLST) analyses. Automated antimicrobial susceptibility testing (AST), the Etest for meropenem and the broth microdilution test for colistin were performed. The carbapenem inactivation method was used to assess carbapenemase production.

Results: pgMLST revealed that nine E. cloacae complex isolates from three different hospitals in the Netherlands differed by <20 alleles and grouped in a genetic cluster termed EclCluster-013. Seven isolates were submitted by one hospital in 2016-20. EclCluster-013 isolates produced carbapenemase and were from ST78, a globally disseminated lineage. EclCluster-013 isolates harboured a 316 078 bp IncH12 plasmid carrying the bla VIM-1 carbapenemase and the novel mcr-9 colistin resistance gene along with genes encoding resistance to different antibiotic classes. AST showed that EclCluster-013 isolates were MDR, but susceptible to meropenem (<2 mg/L) and colistin (<2 mg/L).

Conclusions: The EclCluster-013 reported here represents an MDR E. cloacae complex ST78 strain containing an IncH12 plasmid carrying both the bla VIM-1 carbapenemase and the mcr-9 colistin resistance gene.

RevDate: 2021-07-16
CmpDate: 2021-07-16

Cheng C, Zhou W, Dong X, et al (2021)

Genomic Analysis of Delftia tsuruhatensis Strain TR1180 Isolated From A Patient From China With In4-Like Integron-Associated Antimicrobial Resistance.

Frontiers in cellular and infection microbiology, 11:663933.

Delftia tsuruhatensis has become an emerging pathogen in humans. There is scant information on the genomic characteristics of this microorganism. In this study, we determined the complete genome sequence of a clinical D. tsuruhatensis strain, TR1180, isolated from a sputum specimen of a female patient in China in 2019. Phylogenetic and average nucleotide identity analysis demonstrated that TR1180 is a member of D. tsuruhatensis. TR1180 exhibited resistance to β-lactam, aminoglycoside, tetracycline and sulphonamide antibiotics, but was susceptible to phenicols, fluoroquinolones and macrolides. Its genome is a single, circular chromosome measuring 6,711,018 bp in size. Whole-genome analysis identified 17 antibiotic resistance-related genes, which match the antimicrobial susceptibility profile of this strain, as well as 24 potential virulence factors and a number of metal resistance genes. Our data showed that Delftia possessed an open pan-genome and the genes in the core genome contributed to the pathogenicity and resistance of Delftia strains. Comparative genomics analysis of TR1180 with other publicly available genomes of Delftia showed diverse genomic features among these strains. D. tsuruhatensis TR1180 harbored a unique 38-kb genomic island flanked by a pair of 29-bp direct repeats with the insertion of a novel In4-like integron containing most of the specific antibiotic resistance genes within the genome. This study reports the findings of a fully sequenced genome from clinical D. tsuruhatensis, which provide researchers and clinicians with valuable insights into this uncommon species.

RevDate: 2021-07-06

Koeksoy E, Bezuidt OM, Bayer T, et al (2021)

Zetaproteobacteria Pan-Genome Reveals Candidate Gene Cluster for Twisted Stalk Biosynthesis and Export.

Frontiers in microbiology, 12:679409.

Twisted stalks are morphologically unique bacterial extracellular organo-metallic structures containing Fe(III) oxyhydroxides that are produced by microaerophilic Fe(II)-oxidizers belonging to the Betaproteobacteria and Zetaproteobacteria. Understanding the underlying genetic and physiological mechanisms of stalk formation is of great interest based on their potential as novel biogenic nanomaterials and their relevance as putative biomarkers for microbial Fe(II) oxidation on ancient Earth. Despite the recognition of these special biominerals for over 150 years, the genetic foundation for the stalk phenotype has remained unresolved. Here we present a candidate gene cluster for the biosynthesis and secretion of the stalk organic matrix that we identified with a trait-based analyses of a pan-genome comprising 16 Zetaproteobacteria isolate genomes. The "stalk formation in Zetaproteobacteria" (sfz) cluster comprises six genes (sfz1-sfz6), of which sfz1 and sfz2 were predicted with functions in exopolysaccharide synthesis, regulation, and export, sfz4 and sfz6 with functions in cell wall synthesis manipulation and carbohydrate hydrolysis, and sfz3 and sfz5 with unknown functions. The stalk-forming Betaproteobacteria Ferriphaselus R-1 and OYT-1, as well as dread-forming Zetaproteobacteria Mariprofundus aestuarium CP-5 and Mariprofundus ferrinatatus CP-8 contain distant sfz gene homologs, whereas stalk-less Zetaproteobacteria and Betaproteobacteria lack the entire gene cluster. Our pan-genome analysis further revealed a significant enrichment of clusters of orthologous groups (COGs) across all Zetaproteobacteria isolate genomes that are associated with the regulation of a switch between sessile and motile growth controlled by the intracellular signaling molecule c-di-GMP. Potential interactions between stalk-former unique transcription factor genes, sfz genes, and c-di-GMP point toward a c-di-GMP regulated surface attachment function of stalks during sessile growth.

RevDate: 2021-07-06

Farace PD, Irazoqui JM, Morsella CG, et al (2021)

Phylogenomic analysis for Campylobacter fetus ocurring in Argentina.

Veterinary world, 14(5):1165-1179.

Background and Aim: Campylobacter fetus is one of the most important pathogens that severely affects livestock industry worldwide. C. fetus mediated bovine genital campylobacteriosis infection in cattle has been associated with significant economic losses in livestock production in the Pampas region, the most productive area of Argentina. The present study aimed to establish the genomic relationships between C. fetus strains, isolated from the Pampas region, at local and global levels. The study also explored the utility of multi-locus sequence typing (MLST) as a typing technique for C. fetus.

Materials and Methods: For pangenome and phylogenetic analysis, whole genome sequences for 34 C. fetus strains, isolated from cattle in Argentina were downloaded from GenBank. A local maximum likelihood (ML) tree was constructed and linked to a Microreact project. In silico analysis based on MLST was used to obtain information regarding sequence type (ST) for each strain. For global phylogenetic analysis, a core genome ML-tree was constructed using genomic dataset for 265 C. fetus strains, isolated from various sources obtained from 20 countries.

Results: The local core genome phylogenetic tree analysis described the presence of two major clusters (A and B) and one minor cluster (C). The occurrence of 82% of the strains in these three clusters suggested a clonal population structure for C. fetus. The MLST analysis for the local strains revealed that 31 strains were ST4 type and one strain was ST5 type. In addition, a new variant was identified that was assigned a novel ST, ST70. In the present case, ST4 was homogenously distributed across all the regions and clusters. The global analysis showed that most of the local strains clustered in the phylogenetic groups that comprised exclusively of the strains isolated from Argentina. Interestingly, three strains showed a close genetic relationship with bovine strains obtained from Uruguay and Brazil. The ST5 strain grouped in a distant cluster, with strains obtained from different sources from various geographic locations worldwide. Two local strains clustered in a phylogenetic group comprising intercontinental Campylobacter fetus venerealis strains.

Conclusion: The results of the study suggested active movement of animals, probably due to economic trade between different regions of the country as well as with neighboring countries. MLST results were partially concordant with phylogenetic analysis. Thus, this method did not qualify as a reliable subtyping method to assess C. fetus diversity in Argentina. The present study provided a basic platform to conduct future research on C. fetus, both at local and international levels.

RevDate: 2021-07-24

Carpi FM, Coman MM, Silvi S, et al (2021)

Comprehensive pan-genome analysis of Lactiplantibacillus plantarum complete genomes.

Journal of applied microbiology [Epub ahead of print].

AIMS: The aim of this work was to refine the taxonomy and the functional characterization of publicly available Lactiplantibacillus plantarum complete genomes through a pan-genome analysis. Particular attention was paid in depicting the probiotic potential of each strain.

METHODS AND RESULTS: Complete genome sequence of 127 L. plantarum strains, without detected anomalies, was downloaded from NCBI. Roary analysis of L. plantarum pan-genome identified 1436 core, 414 soft core, 1858 shell and 13,203 cloud genes, highlighting the 'open' nature of L. plantarum pan-genome. Identification and characterization of plasmid content, mobile genetic elements, adaptative immune system and probiotic marker genes (PMGs) revealed unique features across all the L. plantarum strains included in the present study. Considering our updated list of PMGs, we determined that approximatively 70% of the PMGs belongs to the core/soft-core genome.

CONCLUSIONS: The comparative genomic analysis conducted in this study provide new insights into the genomic content and variability of L. plantarum.

This study provides a comprehensive pan-genome analysis of L. plantarum, including the largest number (N = 127) of complete L. plantarum genomes retrieved from publicly available repositories. Our effort aimed to determine a solid reference panel for the future characterization of newly sequenced L. plantarum strains useful as probiotic supplements.

RevDate: 2021-07-02

Ge T, Jiang H, Tan EH, et al (2021)

Pangenomic Analysis of Dickeya dianthicola Strains Related to the Outbreak of Blackleg and Soft Rot of Potato in USA.

Plant disease [Epub ahead of print].

Dickeya dianthicola has caused an outbreak of blackleg and soft rot of potato in the eastern half of the USA since 2015. To investigate genetic diversity of the pathogen, a comparative analysis was conducted on genomes of D. dianthicola strains. Whole genomes of 16 strains from the USA outbreak were assembled and compared to 16 previously sequenced genomes of D. dianthicola isolated from potato or carnation. Among the 32 strains, eight distinct clades were distinguished based on phylogenomic analysis. The outbreak strains were grouped into three clades, with the majority of the strains in clade I. Clade I strains were unique and homogeneous, suggesting a recent incursion of this strain into potato production from alternative hosts or environmental sources. Pangenome of the 32 strains contained 6693 genes, 3377 of which were core genes. By screening primary protein subunits associated with virulence from all USA strains, we found many virulence-related gene clusters, such as plant cell wall degrading enzyme genes, flagellar and chemotaxis related genes, two-component regulatory genes, and type I/II/III secretion system genes were highly conserved but type IV and type VI secretion system genes varied. The virulent clade I strains encoded two clusters of type IV secretion systems, while clade II and III strains encoded only one cluster. Clade I and II strains encoded one more VgrG/PAAR spike protein than clade III. Thus, we predicted that the presence of additional virulence-related genes may have enabled the unique clade I strain to become predominant source in the USA outbreak.

RevDate: 2021-07-27

Pintado A, Pérez-Martínez I, Aragón IM, et al (2021)

The Rhizobacterium Pseudomonas alcaligenes AVO110 Induces the Expression of Biofilm-Related Genes in Response to Rosellinia necatrix Exudates.

Microorganisms, 9(7):.

The rhizobacterium Pseudomonas alcaligenes AVO110 exhibits antagonism toward the phytopathogenic fungus Rosellinia necatrix. This strain efficiently colonizes R. necatrix hyphae and is able to feed on their exudates. Here, we report the complete genome sequence of P. alcaligenes AVO110. The phylogeny of all available P. alcaligenes genomes separates environmental isolates, including AVO110, from those obtained from infected human blood and oyster tissues, which cluster together with Pseudomonas otitidis. Core and pan-genome analyses showed that P. alcaligenes strains encode highly heterogenic gene pools, with the AVO110 genome encoding the largest and most exclusive variable region (~1.6 Mb, 1795 genes). The AVO110 singletons include a wide repertoire of genes related to biofilm formation, several of which are transcriptionally modulated by R. necatrix exudates. One of these genes (cmpA) encodes a GGDEF/EAL domain protein specific to Pseudomonas spp. strains isolated primarily from the rhizosphere of diverse plants, but also from soil and water samples. We also show that CmpA has a role in biofilm formation and that the integrity of its EAL domain is involved in this function. This study contributes to a better understanding of the niche-specific adaptations and lifestyles of P. alcaligenes, including the mycophagous behavior of strain AVO110.

RevDate: 2021-07-26
CmpDate: 2021-07-26

Alouane T, Rimbert H, Bormann J, et al (2021)

Comparative Genomics of Eight Fusarium graminearum Strains with Contrasting Aggressiveness Reveals an Expanded Open Pangenome and Extended Effector Content Signatures.

International journal of molecular sciences, 22(12):.

Fusarium graminearum, the primary cause of Fusarium head blight (FHB) in small-grain cereals, demonstrates remarkably variable levels of aggressiveness in its host, producing different infection dynamics and contrasted symptom severity. While the secreted proteins, including effectors, are thought to be one of the essential components of aggressiveness, our knowledge of the intra-species genomic diversity of F. graminearum is still limited. In this work, we sequenced eight European F. graminearum strains of contrasting aggressiveness to characterize their respective genome structure, their gene content and to delineate their specificities. By combining the available sequences of 12 other F. graminearum strains, we outlined a reference pangenome that expands the repertoire of the known genes in the reference PH-1 genome by 32%, including nearly 21,000 non-redundant sequences and gathering a common base of 9250 conserved core-genes. More than 1000 genes with high non-synonymous mutation rates may be under diverse selection, especially regarding the trichothecene biosynthesis gene cluster. About 900 secreted protein clusters (SPCs) have been described. Mostly localized in the fast sub-genome of F. graminearum supposed to evolve rapidly to promote adaptation and rapid responses to the host's infection, these SPCs gather a range of putative proteinaceous effectors systematically found in the core secretome, with the chloroplast and the plant nucleus as the main predicted targets in the host cell. This work describes new knowledge on the intra-species diversity in F. graminearum and emphasizes putative determinants of aggressiveness, providing a wealth of new candidate genes potentially involved in the Fusarium head blight disease.

RevDate: 2021-09-07

Ahmed O, Rossi M, Kovaka S, et al (2021)

Pan-genomic matching statistics for targeted nanopore sequencing.

iScience, 24(6):102696.

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.

RevDate: 2021-07-02

Li Y, Wang M, Sun ZZ, et al (2021)

Comparative Genomic Insights Into the Taxonomic Classification, Diversity, and Secondary Metabolic Potentials of Kitasatospora, a Genus Closely Related to Streptomyces.

Frontiers in microbiology, 12:683814.

While the genus Streptomyces (family Streptomycetaceae) has been studied as a model for bacterial secondary metabolism and genetics, its close relatives have been less studied. The genus Kitasatospora is the second largest genus in the family Streptomycetaceae. However, its taxonomic position within the family remains under debate and the secondary metabolic potential remains largely unclear. Here, we performed systematic comparative genomic and phylogenomic analyses of Kitasatospora. Firstly, the three genera within the family Streptomycetaceae (Kitasatospora, Streptomyces, and Streptacidiphilus) showed common genomic features, including high G + C contents, high secondary metabolic potentials, and high recombination frequencies. Secondly, phylogenomic and comparative genomic analyses revealed phylogenetic distinctions and genome content differences among these three genera, supporting Kitasatospora as a separate genus within the family. Lastly, the pan-genome analysis revealed extensive genetic diversity within the genus Kitasatospora, while functional annotation and genome content comparison suggested genomic differentiation among lineages. This study provided new insights into genomic characteristics of the genus Kitasatospora, and also uncovered its previously underestimated and complex secondary metabolism.

LOAD NEXT 100 CITATIONS

RJR Experience and Expertise

Researcher

Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.

Administrator

Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.

Technologist

Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.

Facilitator

Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

963 Red Tail Lane
Bellingham, WA 98226

206-300-3443

E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )