31 Jul 2021
Bibliography on: Pangenome


Created: 31 Jul 2021 


Although the enforced stability of genomic content is ubiquitous among MCEs, the opposite is proving to be the case among prokaryotes, which exhibit remarkable and adaptive plasticity of genomic content. Early bacterial whole-genome sequencing efforts discovered that whenever a particular "species" was re-sequenced, new genes were found that had not been detected earlier — entirely new genes, not merely new alleles. This led to the concepts of the bacterial core-genome, the set of genes found in all members of a particular "species", and the flex-genome, the set of genes found in some, but not all members of the "species". Together these make up the species' pan-genome.

Created with PubMed® Query: pangenome or "pan-genome" or "pan genome"

Citations The Papers (from PubMed®)


RevDate: 2021-07-30

Wang K, Hu H, Tian Y, et al (2021)

The chicken pan-genome reveals gene content variation and a promoter region deletion in IGF2BP1 affecting body size.

Molecular biology and evolution pii:6332014 [Epub ahead of print].

Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional ∼66.5 Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression level are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based GWAS identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size QTL located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.

RevDate: 2021-07-30

Hu H, Scheben A, Verpaalen B, et al (2021)

Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation.

Amborella trichopoda (Amborellaceae) is the single living sister species of all other extant flowering plants and only occurs in rain forest habitats on the remote island of New Caledonia. These features make Amborella an important species in which to study genetic variation, including gene presence/absence variants (PAVs). Here, we apply the reference genome based iterative mapping and assembly strategy (Bayer et al., 2020) to assess gene diversity across ten diverse individuals.

RevDate: 2021-07-29

Davidson RM, Benoit JB, Kammlade SM, et al (2021)

Genomic characterization of sporadic isolates of the dominant clone of Mycobacterium abscessus subspecies massiliense.

Scientific reports, 11(1):15336.

Recent studies have characterized a dominant clone (Clone 1) of Mycobacterium abscessus subspecies massiliense (M. massiliense) associated with high prevalence in cystic fibrosis (CF) patients, pulmonary outbreaks in the United States (US) and United Kingdom (UK), and a Brazilian epidemic of skin infections. The prevalence of Clone 1 in non-CF patients in the US and the relationship of sporadic US isolates to outbreak clones are not known. We surveyed a reference US Mycobacteria Laboratory and a US biorepository of CF-associated Mycobacteria isolates for Clone 1. We then compared genomic variation and antimicrobial resistance (AMR) mutations between sporadic non-CF, CF, and outbreak Clone 1 isolates. Among reference lab samples, 57/147 (39%) of patients with M. massiliense had Clone 1, including pulmonary and extrapulmonary infections, compared to 11/64 (17%) in the CF isolate biorepository. Core and pan genome analyses revealed that outbreak isolates had similar numbers of single nucleotide polymorphisms (SNPs) and accessory genes as sporadic US Clone 1 isolates. However, pulmonary outbreak isolates were more likely to have AMR mutations compared to sporadic isolates. Clone 1 isolates are present among non-CF and CF patients across the US, but additional studies will be needed to resolve potential routes of transmission and spread.

RevDate: 2021-07-23

Liu Z, Zhao Y, Sossah FL, et al (2021)

Characterization, Pathogenicity, Phylogeny, and Comparative Genomic Analysis of Pseudomonas tolaasii Strains Isolated from Various Mushrooms in China.

Phytopathology [Epub ahead of print].

Since 2016, devastating bacterial blotch affecting the fruiting bodies of Agaricus bisporus, Cordyceps militaris, Flammulina filiformis, and Pleurotus ostreatus in China has caused severe economic losses. We isolated 102 bacterial strains and characterized them polyphasically. We identified the causal agent as Pseudomonas tolaasii and confirmed the pathogenicity of the strains. A host range test further confirmed the pathogen's ability to infect multiple hosts. This is the first report in China of bacterial blotch in C. militaris caused by P. tolaasii. Whole-genome sequences were generated for three strains: Pt11 (6.48 Mb), Pt51 (6.63 Mb), and Pt53 (6.80 Mb), and pangenome analysis was performed with 13 other publicly accessible P. tolaasii genomes to determine their genetic diversity, virulence, antibiotic resistance, and mobile genetic elements. The pangenome of P. tolaasii is open, and many more gene families are likely to emerge with further genome sequencing. Multilocus sequence analysis using the sequences of four common housekeeping genes (glns, gyrB, rpoB, and rpoD) showed high genetic variability among the P. tolaasii strains, with 115 strains clustered into a monophyletic group. The P. tolaasii strains possess various genes for secretion systems, virulence factors, carbohydrate-active enzymes, toxins, secondary metabolites, and antimicrobial resistance genes that are associated with pathogenesis and adapted to different environments. The myriad of insertion sequences, integrons, prophages, and genome islands encoded in the strains may contribute to genome plasticity, virulence, and antibiotic resistance. These findings advance understanding of the determinants of virulence, which can be targeted for the effective control of bacterial blotch disease.

RevDate: 2021-07-26

Bayer PE, Scheben A, Golicz AA, et al (2021)

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids.

Plant biotechnology journal [Epub ahead of print].

Plant genomes demonstrate significant presence/absence variation (PAV) within a species, however the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidisation, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.

RevDate: 2021-07-26

Hernández-Juárez LE, Camorlinga M, Méndez-Tenorio A, et al (2021)

Analyses of publicly available Hungatella hathewayi genomes revealed genetic distances indicating they belong to more than one species.

Virulence, 12(1):1950-1964.

Hungatella hathewayi has been observed to be a member of the gut microbiome. Unfortunately, little is known about this organism in spite of being associated with human fatalities; it is important to understand virulence mechanisms and epidemiological prospective to cause disease. In this study, a patient with chronic neurologic symptoms presented to the clinic with subsequent isolation of a strain with phenotypic characteristics suggestive of Clostridium difficile. However, whole-genome sequence found the organism to be H. hathewayi. Analysis including publicly available Hungatella genomes found substantial genomic differences as compared to the type strain, indicating this isolate was not C. difficile. We examined the whole-genome of Hungatella species and related genera, using comparative genomics to fully examine species identification and toxin production. Orthogonal phylogenetic using the 16S rRNA gene and entire genome analyses that included genome distance analyses using Genome-to-Genome Distance (GGDC), Average Nucleotide Identity (ANI), and a pan-genome analysis with inclusion of available public genomes determined the speciation to be Hungatella. Two clearly differentiated groups were identified, one including a reference H. hathewayi genome (strain DSM-13,479) and a second group that was determined to be H. effluvii, which included our clinical isolate. Also, some genomes reported as H. hathewayi were found to belong to other genera, including Clostridium and Faecalicatena. We show that the Hungatella species have an open pan-genome reflecting high genomic diversity. This study highlights the importance of correctly assigning taxonomic identification, particularly in disease-associated strains, to better understand virulence and therapeutic options.

RevDate: 2021-07-21

Bayer PE, Petereit J, Danilevicz MF, et al (2021)

The application of pangenomics and machine learning in genomic selection in plants.

The plant genome [Epub ahead of print].

Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.

RevDate: 2021-07-21

Fiedoruk K, Drewnowska JM, Mahillon J, et al (2021)

Pan-Genome Portrait of Bacillus mycoides Provides Insights into the Species Ecology and Evolution.

Microbiology spectrum [Epub ahead of print].

Bacillus mycoides is poorly known despite its frequent occurrence in a wide variety of environments. To provide direct insight into its ecology and evolutionary history, a comparative investigation of the species pan-genome and the functional gene categorization of 35 isolates obtained from soil samples from northeastern Poland was performed. The pan-genome of these isolates is composed of 20,175 genes and is characterized by a strong predominance of adaptive genes (∼83%), a significant amount of plasmid genes (∼37%), and a great contribution of prophages and insertion sequences. The pan-genome structure and phylodynamic studies had suggested a wide genomic diversity among the isolates, but no correlation between lineages and the bacillus origin was found. Nevertheless, the two B. mycoides populations, one from Białowieża National Park, the last European natural primeval forest with soil classified as organic, and the second from mineral soil samples taken in a farm in Jasienówka, a place with strong anthropogenic pressure, differ significantly in the frequency of genes encoding proteins enabling bacillus adaptation to specific stress conditions and production of a set of compounds, thus facilitating their colonization of various ecological niches. Furthermore, differences in the prevalence of essential stress sigma factors might be an important trail of this process. Due to these numerous adaptive genes, B. mycoides is able to quickly adapt to changing environmental conditions. IMPORTANCE This research allows deeper understanding of the genetic organization of natural bacterial populations, specifically, Bacillus mycoides, a psychrotrophic member of the Bacillus cereus group that is widely distributed worldwide, especially in areas with continental cold climates. These thorough analyses made it possible to describe, for the first time, the B. mycoides pan-genome, phylogenetic relationship within this species, and the mechanisms behind the species ecology and evolutionary history. Our study indicates a set of functional properties and adaptive genes, in particular, those encoding sigma factors, associated with B. mycoides acclimatization to specific ecological niches and changing environmental conditions.

RevDate: 2021-07-21

Steidele CE, R Stam (2021)

Multi-omics approach highlights differences between RLP classes in Arabidopsis thaliana.

BMC genomics, 22(1):557.

BACKGROUND: The Leucine rich-repeat (LRR) receptor-like protein (RLP) family is a complex gene family with 57 members in Arabidopsis thaliana. Some members of the RLP family are known to be involved in basal developmental processes, whereas others are involved in defence responses. However, functional data is currently only available for a small subset of RLPs, leaving the remaining ones classified as RLPs of unknown function.

RESULTS: Using publicly available datasets, we annotated RLPs of unknown function as either likely defence-related or likely fulfilling a more basal function in plants. Then, using these categories, we can identify important characteristics that differ between the RLP subclasses. We found that the two classes differ in abundance on both transcriptome and proteome level, physical clustering in the genome and putative interaction partners. However, the classes do not differ in the genetic di versity of their individual members in accessible pan-genome data.

CONCLUSIONS: Our work has several implications for work related to functional studies on RLPs as well as for the understanding of RLP gene family evolution. Using our annotations, we can make suggestions on which RLPs can be identified as potential immune receptors using genetics tools and thereby complement disease studies. The lack of differences in nucleotide diversity between the two RLP subclasses further suggests that non-synonymous diversity of gene sequences alone cannot distinguish defence from developmental genes. By contrast, differences in transcript and protein abundance or clustering at genomic loci might also allow for functional annotations and characterisation in other plant species.

RevDate: 2021-07-20

Wu JJ, Chou HP, Huang JW, et al (2021)

Genomic and biochemical characterization of antifungal compounds produced by Bacillus subtilis PMB102 against Alternaria brassicicola.

Microbiological research, 251:126815 pii:S0944-5013(21)00121-X [Epub ahead of print].

Bacillus subtilis is ubiquitous and capable of producing various metabolites, which make the bacterium a good candidate as a biocontrol agent for managing plant diseases. In this study, a phyllosphere bacterium B. subtilis PMB102 isolated from tomato leaf was found to inhibit the growth of Alternaria brassicicola ABA-31 on PDA and suppress Alternaria leaf spot on Chinese cabbage (Brassica rapa). The genome of PMB102 (Accession no. CP047645) was completely sequenced by Nanopore and Illumina technology to generate a circular chromosome of 4,103,088 bp encoding several gene clusters for synthesizing bioactive compounds. PMB102 and the other B. subtilis strains from different sources were compared in pangenome analysis to identify a suite of conserved genes involved in biocontrol and habitat adaptation. Two predicted gene products, surfactin and fengycin, were extracted from PMB102 culture filtrates and verified by LC-MS/MS. The antifungal activity of fengycin was tested on A. brassicicola ABA-31 in bioautography to inhibit hyphae growth, and in co-culturing assays to elicit the formation of swollen hyphae. Our data revealed that B. subtilis PMB102 suppresses Alternaria leaf spot by the production of antifungal metabolites, and fengycin plays an important role to inhibit the vegetative growth of A. brassicicola ABA-31.

RevDate: 2021-07-20

Branford I, Johnson S, Chapwanya A, et al (2021)

Comprehensive Molecular Dissection of Dermatophilus congolensis Genome and First Observation of tet(Z) Tetracycline Resistance.

International journal of molecular sciences, 22(13): pii:ijms22137128.

Dermatophilus congolensis is a bacterial pathogen mostly of ruminant livestock in the tropics/subtropics and certain temperate climate areas. It causes dermatophilosis, a skin disease that threatens food security by lowering animal productivity and compromising animal health and welfare. Since it is a prevalent infection in ruminants, dermatophilosis warrants more research. There is limited understanding of its pathogenicity, and as such, there is no registered vaccine against D. congolensis. To better understanding the genomics of D. congolensis, the primary aim of this work was to investigate this bacterium using whole-genome sequencing and bioinformatic analysis. D. congolensis is a high GC member of the Actinobacteria and encodes approximately 2527 genes. It has an open pan-genome, contains many potential virulence factors, secondary metabolites and encodes at least 23 housekeeping genes associated with antimicrobial susceptibility mechanisms and some isolates have an acquired antimicrobial resistance gene. Our isolates contain a single CRISPR array Cas type IE with classical 8 Cas genes. Although the isolates originate from the same geographical location there is some genomic diversity among them. In conclusion, we present the first detailed genomic study on D. congolensis, including the first observation of tet(Z), a tetracycline resistance-conferring gene.

RevDate: 2021-07-19

Basharat Z, Jahanzaib M, N Rahman (2021)

Therapeutic target identification via differential genome analysis of antibiotic resistant Shigella sonneii and inhibitor evaluation against a selected drug target.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases pii:S1567-1348(21)00302-6 [Epub ahead of print].

Shigella sonnei has been implicated in bloody diarrhea (accompanied by abdominal pain and fever) and is an emerging pathogen of concern, especially in developing countries. The major means of transmission is the fecal-oral route while sexual transmission has also been reported. In children, the impact might be stunted growth due to life-threatening illness. Resistance has been reported in this species for several types of antibiotics. In this study, we retrieved the antibiotic-resistant labeled whole genome sequences of the species from the PATRIC database and performed a pan-genome analysis to filter out core genes. Antibiotic resistance was studied in the core, accessory and unique genome. Core genes were utilized as seed substance for essentiality analysis and drug candidate assignment. Product of the gene aroG, i.e. chorismate biosynthetic process 3-deoxy-7-phosphoheptulonate synthase enzyme, responsible for aromatic amino acid family biosynthetic process, was taken for further downstream processing. Natural product libraries of flavonoids (n = 178), ZINC database derived inhibitor compounds of the 3-deoxy-7-phosphoheptulonate synthase enzyme (n = 112), and streptomycin compounds (n = 737) were docked to find out potent inhibitors, followed by dynamics simulation of 50 ns each for top compounds.. Physicochemical and ADMET profiling of the top compounds was done to analyze their safety for consumption. We propose that the top compounds: Phytoene from Streptomycin library, Hesperidin methylchalcone from flavonoid library, and ZINC000036444158 (synonym:1,16-bis[(dihydroxyphosphinyl)oxy]hexadecane) from 3-deoxy-7-phosphoheptulonate synthase inhibitor library of ZINC database (and used as a control in this study) should be tested in vitro against Shigella sonnei, to fully determine their efficacy. This could add to the drying pipeline of potent drug molecules against emerging pathogens.

RevDate: 2021-07-18

Bornowski N, Michel KJ, Hamilton JP, et al (2021)

Genomic variation within the maize stiff-stalk heterotic germplasm pool.

The plant genome [Epub ahead of print].

The stiff-stalk heterotic group in Maize (Zea mays L.) is an important source of inbreds used in U.S. commercial hybrid production. Founder inbreds B14, B37, B73, and, to a lesser extent, B84, are found in the pedigrees of a majority of commercial seed parent inbred lines. We created high-quality genome assemblies of B84 and four expired Plant Variety Protection (ex-PVP) lines LH145 representing B14, NKH8431 of mixed descent, PHB47 representing B37, and PHJ40, which is a Pioneer Hi-Bred International (PHI) early stiff-stalk type. Sequence was generated using long-read sequencing achieving highly contiguous assemblies of 2.13-2.18 Gbp with N50 scaffold lengths >200 Mbp. Inbred-specific gene annotations were generated using a core five-tissue gene expression atlas, whereas transposable element (TE) annotation was conducted using de novo and homology-directed methodologies. Compared with the reference inbred B73, synteny analyses revealed extensive collinearity across the five stiff-stalk genomes, although unique components of the maize pangenome were detected. Comparison of this set of stiff-stalk inbreds with the original Iowa Stiff Stalk Synthetic breeding population revealed that these inbreds represent only a proportion of variation in the original stiff-stalk pool and there are highly conserved haplotypes in released public and ex-Plant Variety Protection inbreds. Despite the reduction in variation from the original stiff-stalk population, substantial genetic and genomic variation was identified supporting the potential for continued breeding success in this pool. The assemblies described here represent stiff-stalk inbreds that have historical and commercial relevance and provide further insight into the emerging maize pangenome.

RevDate: 2021-07-16

Verma DK, Chaudhary C, Singh L, et al (2021)

Corrigendum: Isolation and Taxonomic Characterization of Novel Haloarchaeal Isolates From Indian Solar Saltern: A Brief Review on Distribution of Bacteriorhodopsins and V-Type ATPases in Haloarchaea.

Frontiers in microbiology, 12:713942.

[This corrects the article DOI: 10.3389/fmicb.2020.554927.].

RevDate: 2021-07-16

Liao J, Guo X, Weller DL, et al (2021)

Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution.

Nature microbiology [Epub ahead of print].

Natural bacterial populations can display enormous genomic diversity, primarily in the form of gene content variation caused by the frequent exchange of DNA with the local environment. However, the ecological drivers of genomic variability and the role of selection remain controversial. Here, we address this gap by developing a nationwide atlas of 1,854 Listeria isolates, collected systematically from soils across the contiguous United States. We found that Listeria was present across a wide range of environmental parameters, being mainly controlled by soil moisture, molybdenum and salinity concentrations. Whole-genome data from 594 representative strains allowed us to decompose Listeria diversity into 12 phylogroups, each with large differences in habitat breadth and endemism. 'Cosmopolitan' phylogroups, prevalent across many different habitats, had more open pangenomes and displayed weaker linkage disequilibrium, reflecting higher rates of gene gain and loss, and allele exchange than phylogroups with narrow habitat ranges. Cosmopolitan phylogroups also had a large fraction of genes affected by positive selection. The effect of positive selection was more pronounced in the phylogroup-specific core genome, suggesting that lineage-specific core genes are important drivers of adaptation. These results indicate that genome flexibility and recombination are the consequence of selection to survive in variable environments.

RevDate: 2021-07-14

Norri T, Cazaux B, Dönges S, et al (2021)

Founder Reconstruction Enables Scalable and Seamless Pangenomic Analysis.

Bioinformatics (Oxford, England) pii:6321452 [Epub ahead of print].

MOTIVATION: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge.

RESULTS: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.

AVAILABILITY: Our open access tools and instructions how to reproduce our experiments are available at the following address: https://github.com/algbio/panvc-founders.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-07-13

Lu TY, Human Genome Structural Variation Consortium, MJP Chaisson (2021)

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.

Nature communications, 12(1):4250.

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

RevDate: 2021-07-13

Jain C, Tavakoli N, S Aluru (2021)

A variant selection framework for genome graphs.

Bioinformatics (Oxford, England), 37(Suppl_1):i460-i467.

MOTIVATION: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.

RESULTS: In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.

: https://github.com/AT-CG/VF.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

RevDate: 2021-07-12

Pedrós-Alió C (2021)

Time travel in microorganisms.

Systematic and applied microbiology, 44(4):126227 pii:S0723-2020(21)00050-3 [Epub ahead of print].

RevDate: 2021-07-12

Nie S, Wang B, Ding H, et al (2021)

Genome assembly of the Chinese maize elite inbred line RP125 and its EMS mutant collection provide new resources for maize genetics research and crop improvement.

The Plant journal : for cell and molecular biology [Epub ahead of print].

Maize is an important crop worldwide, as well as a valuable model with vast genetic diversity. Accurate genome and annotation information for a wide range of inbred lines would provide valuable resources for crop improvement and pan-genome characterization. In this study, we generated a high-quality de novo genome assembly (contig N50 of 15.43 megabases) of the Chinese elite inbred line RP125 using Nanopore long-read sequencing and Hi-C scaffolding, which yield highly contiguous, chromosome-length scaffolds. Global comparison of the RP125 genome with those of B73, W22, and Mo17 revealed a large number of structural variations. To create new germplasm for maize research and crop improvement, we carried out an EMS mutagenesis screen on RP125. We obtained a total of 5,818 independent M2 families, with 946 mutants showing heritable phenotypes. Taking advantage of the high-quality RP125 genome, we successfully cloned 10 mutants from the EMS library, including the novel kernel mutant qk1 (quekou: 'missing a small part' in Chinese), which exhibited partial loss of endosperm and a starch accumulation defect. QK1 encodes a predicted metal tolerance protein that is specifically required for iron transport. Increased accumulation of iron and ROS as well as ferroptosis-like cell death were detected in endosperm of qk1. Our study provides the community with a high-quality genome sequence and a large collection of mutant germplasm.

RevDate: 2021-07-12

Noroy C, DF Meyer (2021)

The super repertoire of type IV effectors in the pangenome of Ehrlichia spp. provides insights into host-specificity and pathogenesis.

PLoS computational biology, 17(7):e1008788 pii:PCOMPBIOL-D-21-00300.

The identification of bacterial effectors is essential to understand how obligatory intracellular bacteria such as Ehrlichia spp. manipulate the host cell for survival and replication. Infection of mammals-including humans-by the intracellular pathogenic bacteria Ehrlichia spp. depends largely on the injection of virulence proteins that hijack host cell processes. Several hypothetical virulence proteins have been identified in Ehrlichia spp., but one so far has been experimentally shown to translocate into host cells via the type IV secretion system. However, the current challenge is to identify most of the type IV effectors (T4Es) to fully understand their role in Ehrlichia spp. virulence and host adaptation. Here, we predict the T4E repertoires of four sequenced Ehrlichia spp. and four other Anaplasmataceae as comparative models (pathogenic Anaplasma spp. and Wolbachia endosymbiont) using previously developed S4TE 2.0 software. This analysis identified 579 predicted T4Es (228 pT4Es for Ehrlichia spp. only). The effector repertoires of Ehrlichia spp. overlapped, thereby defining a conserved core effectome of 92 predicted effectors shared by all strains. In addition, 69 species-specific T4Es were predicted with non-canonical GC% mostly in gene sparse regions of the genomes and we observed a bias in pT4Es according to host-specificity. We also identified new protein domain combinations, suggesting novel effector functions. This work presenting the predicted effector collection of Ehrlichia spp. can serve as a guide for future functional characterisation of effectors and design of alternative control strategies against these bacteria.

RevDate: 2021-07-13

Cao H, Xu H, Ning C, et al (2021)

Multi-Omics Approach Reveals the Potential Core Vaccine Targets for the Emerging Foodborne Pathogen Campylobacter jejuni.

Frontiers in microbiology, 12:665858.

Campylobacter jejuni is a leading cause of bacterial gastroenteritis in humans around the world. The emergence of bacterial resistance is becoming more serious; therefore, development of new vaccines is considered to be an alternative strategy against drug-resistant pathogen. In this study, we investigated the pangenome of 173 C. jejuni strains and analyzed the phylogenesis and the virulence factor genes. In order to acquire a high-quality pangenome, genomic relatedness was firstly performed with average nucleotide identity (ANI) analyses, and an open pangenome of 8,041 gene families was obtained with the correct taxonomy genomes. Subsequently, the virulence property of the core genome was analyzed and 145 core virulence factor (VF) genes were obtained. Upon functional genomics and immunological analyses, five core VF proteins with high antigenicity were selected as potential core vaccine targets for humans. Furthermore, functional annotations indicated that these proteins are involved in important molecular functions and biological processes, such as adhesion, regulation, and secretion. In addition, transcriptome analysis in human cells and pig intestinal loop proved that these vaccine target genes are important in the virulence of C. jejuni in different hosts. Comprehensive pangenome and relevant animal experiments will facilitate discovering the potential core vaccine targets with improved efficiency in reverse vaccinology. Likewise, this study provided some insights into the genetic polymorphism and phylogeny of C. jejuni and discovered potential vaccine candidates for humans. Prospective development of new vaccines using the targets will be an alternative to the use of antibiotics and prevent the development of multidrug-resistant C. jejuni in humans and even other animals.

RevDate: 2021-07-13

Banerjee R, Chaudhari NM, Lahiri A, et al (2021)

Interplay of Various Evolutionary Modes in Genome Diversification and Adaptive Evolution of the Family Sulfolobaceae.

Frontiers in microbiology, 12:639995.

Sulfolobaceae family, comprising diverse thermoacidophilic and aerobic sulfur-metabolizing Archaea from various geographical locations, offers an ideal opportunity to infer the evolutionary dynamics across the members of this family. Comparative pan-genomics coupled with evolutionary analyses has revealed asymmetric genome evolution within the Sulfolobaceae family. The trend of genome streamlining followed by periods of differential gene gains resulted in an overall genome expansion in some species of this family, whereas there was reduction in others. Among the core genes, both Sulfolobus islandicus and Saccharolobus solfataricus showed a considerable fraction of positively selected genes and also higher frequencies of gene acquisition. In contrast, Sulfolobus acidocaldarius genomes experienced substantial amount of gene loss and strong purifying selection as manifested by relatively lower genome size and higher genome conservation. Central carbohydrate metabolism and sulfur metabolism coevolved with the genome diversification pattern of this archaeal family. The autotrophic CO2 fixation with three significant positively selected enzymes from S. islandicus and S. solfataricus was found to be more imperative than heterotrophic CO2 fixation for Sulfolobaceae. Overall, our analysis provides an insight into the interplay of various genomic adaptation strategies including gene gain-loss, mutation, and selection influencing genome diversification of Sulfolobaceae at various taxonomic levels and geographical locations.

RevDate: 2021-07-11

Begrem S, Jérôme M, Leroi F, et al (2021)

Genomic diversity of Serratia proteamaculans and Serratia liquefaciens predominant in seafood products and spoilage potential analyses.

International journal of food microbiology, 354:109326 pii:S0168-1605(21)00285-3 [Epub ahead of print].

Serratia sp. cause food losses and waste due to spoilage; it is noteworthy that they represent a dominant population in seafood. The main spoilage associated species comprise S. liquefaciens, S. grimesii, S. proteamaculans and S. quinivorans, also known as S. liquefaciens-like strains. These species are difficult to discriminate since classical 16S rRNA gene-based sequences do not possess sufficient resolution. In this study, a phylogeny based on the short-length luxS gene was able to speciate 47 Serratia isolates from seafood, with S. proteamaculans being the main species from fresh salmon and tuna, cold-smoked salmon, and cooked shrimp while S. liquefaciens was only found in cold-smoked salmon. The genome of the first S. proteamaculans strain isolated from the seafood matrix (CD3406 strain) was sequenced. Pangenome analyses of S. proteamaculans and S. liquefaciens indicated high adaptation potential. Biosynthetic pathways involved in antimicrobial compounds production and in the main seafood spoilage compounds were also identified. The genetic equipment highlighted in this study contributed to gain further insights into the predominance of Serratia in seafood products and their capacity to spoil.

RevDate: 2021-07-10

Wang S, Rao MPN, Wei D, et al (2021)

Complete genome sequencing and comparative genome analysis of the extremely halophilic archaea, Haloterrigena daqingensis.

Biotechnology and applied biochemistry [Epub ahead of print].

In the present study, we report the complete genome sequencing of Haloterrigena daqingensis type species. The genome of H. daqingensis JX313T consisted of a circular chromosome with three plasmids. The genome size and G+C content were estimated to be 3835796 bp and 61.7%, respectively. A total of 4158 genes were predicted with six rRNAs and 45 tRNAs. Metabolic pathway analysis suggests that H. daqingensis JX313T codes for all the necessary genes responsible to sustain its life at saline environment. The pan-genome analysis suggests that the number of singleton-gene between H. daqingensis and other Haloterrigena species varied. The study not only helps us understand H. daqingensis strategy for dealing with high stress, but it also provides an overview of its genomic makeup. Graphical abstract: 2 Genome sequencing of Haloterrigena daqingensis. Comparative genome analysis of Haloterrigena daqingensis. Genome sequencing of Haloterrigena daqingensis helps us to understand the strategy for dealing with high stress. This article is protected by copyright. All rights reserved.

RevDate: 2021-07-12

Sanoussi CN, Coscolla M, Ofori-Anyinam B, et al (2021)

Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain H37Rv.

Microbial genomics, 7(7):.

Pathogens of the Mycobacterium tuberculosis complex (MTBC) are considered to be monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate strains of the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum) strains, from each other. However, this genome variability and gene content, especially of L5 strains, has not been fully explored and may be important for pathobiology and current approaches for genomic analysis of MTBC strains, including transmission studies. By comparing the genomes of 355 L5 clinical strains (including 3 complete genomes and 352 Illumina whole-genome sequenced isolates) to each other and to H37Rv, we identified multiple genes that were differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sub-lineage into L5.3.1 and L5.3.2. These gene content differences had a small knock-on effect on transmission cluster estimation, with clustering rates influenced by the selected reference genome, and with potential overestimation of recent transmission when using H37Rv as the reference genome. We conclude that full capture of the gene diversity, especially high-resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most whole-genome sequencing data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the pan-genome of M. tuberculosis is at least several kilobases larger than previously thought, implying that a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions.

RevDate: 2021-07-12
CmpDate: 2021-07-12

Sinha D, Sun X, Khare M, et al (2021)

Pangenome analysis and virulence profiling of Streptococcus intermedius.

BMC genomics, 22(1):522.

BACKGROUND: Streptococcus intermedius, a member of the S. anginosus group, is a commensal bacterium present in the normal microbiota of human mucosal surfaces of the oral, gastrointestinal, and urogenital tracts. However, it has been associated with various infections such as liver and brain abscesses, bacteremia, osteo-articular infections, and endocarditis. Since 2005, high throughput genome sequencing methods enabled understanding the genetic landscape and diversity of bacteria as well as their pathogenic role. Here, in order to determine whether specific virulence genes could be related to specific clinical manifestations, we compared the genomes from 27 S. intermedius strains isolated from patients with various types of infections, including 13 that were sequenced in our institute and 14 available in GenBank.

RESULTS: We estimated the theoretical pangenome size to be of 4,020 genes, including 1,355 core genes, 1,054 strain-specific genes and 1,611 accessory genes shared by 2 or more strains. The pangenome analysis demonstrated that the genomic diversity of S. intermedius represents an "open" pangenome model. We identified a core virulome of 70 genes and 78 unique virulence markers. The phylogenetic clusters based upon core-genome sequences and SNPs were independent from disease types and sample sources. However, using Principal Component analysis based on presence/ absence of virulence genes, we identified the sda histidine kinase, adhesion protein LAP and capsular polysaccharide biosynthesis protein cps4E as being associated to brain abscess or broncho-pulmonary infection. In contrast, liver and abdominal abscess were associated to presence of the fibronectin binding protein fbp54 and capsular polysaccharide biosynthesis protein cap8D and cpsB.

CONCLUSIONS: Based on the virulence gene content of 27 S. intermedius strains causing various diseases, we identified putative disease-specific genetic profiles discriminating those causing brain abscess or broncho-pulmonary infection from those causing liver and abdominal abscess. These results provide an insight into S. intermedius pathogenesis and highlights putative targets in a diagnostic perspective.

RevDate: 2021-07-08

Liu C, Peng P, Li W, et al (2021)

Deciphering variation of 239 elite japonica rice genomes for whole genome sequences-enabled breeding.

Genomics pii:S0888-7543(21)00280-9 [Epub ahead of print].

Revealing genomic variation of representative and diverse germplasm is the cornerstone of deploying genomics information into genetic improvement programs of species of agricultural importance. Here we report the re-sequencing of 239 japonica rice elites representing the genetic diversity of japonica germplasm in China, Japan and Korea. A total of 4.8 million SNPs and PAV of 35,634 genes were identified. The elites from Japan and Korea are closely related and relatively less diverse than those from China. A japonica rice pan-genome was constructed, and 35 Mb non-redundant novel sequences were identified, from which 1131 novel genes were predicted. Strong selection signals of genomic regions were detected on most of the chromosomes. The heading date genes Hd1 and Hd3a have been artificially selected during the breeding process. The results from this study lay the foundation for future whole genome sequences-enabled breeding in rice and provide a paradigm for other species.

RevDate: 2021-07-06

Rijzaani H, Bayer PE, Rouard M, et al (2021)

The pangenome of banana highlights differences between genera and genomes.

The plant genome [Epub ahead of print].

Banana (Musaceae family) has a complex genetic history and includes a genus Musa with a variety of cultivated clones with edible fruits, Ensete species that are grown for their edible corm, and monospecific Musella whose generic status has been questioned. The most commonly exported banana cultivars belong to Cavendish, a subgroup of Musa triploid cultivars, which is under threat by fungal pathogens, though there are also related species M. balbisiana Colla (B genome), M. textilis Née (T genome), and M. schizocarpa N. W. Simmonds (S genome), along with hybrids of these genomes, which potentially host genes of agronomic interest. Here we present the first cross-genus pangenome of banana, which contains representatives of the Musa and Ensete genera. Clusters based on gene presence-absence variation (PAV) clearly separate Musa and Ensete, while Musa is split further based on species. These results present the first pangenome study across genus boundaries and identifies genes that differentiate between Musaceae species, information that may support breeding programs in these crops.

RevDate: 2021-07-09

Lovell JT, Bentley NB, Bhattarai G, et al (2021)

Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding.

Nature communications, 12(1):4125.

Genome-enabled biotechnologies have the potential to accelerate breeding efforts in long-lived perennial crop species. Despite the transformative potential of molecular tools in pecan and other outcrossing tree species, highly heterozygous genomes, significant presence-absence gene content variation, and histories of interspecific hybridization have constrained breeding efforts. To overcome these challenges, here, we present diploid genome assemblies and annotations of four outbred pecan genotypes, including a PacBio HiFi chromosome-scale assembly of both haplotypes of the 'Pawnee' cultivar. Comparative analysis and pan-genome integration reveal substantial and likely adaptive interspecific genomic introgressions, including an over-retained haplotype introgressed from bitternut hickory into pecan breeding pedigrees. Further, by leveraging our pan-genome presence-absence and functional annotation database among genomes and within the two outbred haplotypes of the 'Lakota' genome, we identify candidate genes for pest and pathogen resistance. Combined, these analyses and resources highlight significant progress towards functional and quantitative genomics in highly diverse and outbred crops.

RevDate: 2021-07-06

Hendrickx APA, Debast S, Pérez-Vázquez M, et al (2021)

A genetic cluster of MDR Enterobacter cloacae complex ST78 harbouring a plasmid containing bla VIM-1 and mcr-9 in the Netherlands.

JAC-antimicrobial resistance, 3(2):dlab046.

Background: Carbapenemases produced by Enterobacterales are often encoded by genes on transferable plasmids and represent a major healthcare problem, especially if the plasmids contain additional antibiotic resistance genes. As part of Dutch national surveillance, 50 medical microbiological laboratories submit their Enterobacterales isolates suspected of carbapenemase production to the National Institute for Public Health and the Environment for characterization. All isolates for which carbapenemase production is confirmed are subjected to next-generation sequencing.

Objectives: To study the molecular characteristics of a genetic cluster of Enterobacter cloacae complex isolates collected in Dutch national surveillance in the period 2015-20 in the Netherlands.

Methods: Short- and long-read genome sequencing was used in combination with MLST and pan-genome MLST (pgMLST) analyses. Automated antimicrobial susceptibility testing (AST), the Etest for meropenem and the broth microdilution test for colistin were performed. The carbapenem inactivation method was used to assess carbapenemase production.

Results: pgMLST revealed that nine E. cloacae complex isolates from three different hospitals in the Netherlands differed by <20 alleles and grouped in a genetic cluster termed EclCluster-013. Seven isolates were submitted by one hospital in 2016-20. EclCluster-013 isolates produced carbapenemase and were from ST78, a globally disseminated lineage. EclCluster-013 isolates harboured a 316 078 bp IncH12 plasmid carrying the bla VIM-1 carbapenemase and the novel mcr-9 colistin resistance gene along with genes encoding resistance to different antibiotic classes. AST showed that EclCluster-013 isolates were MDR, but susceptible to meropenem (<2 mg/L) and colistin (<2 mg/L).

Conclusions: The EclCluster-013 reported here represents an MDR E. cloacae complex ST78 strain containing an IncH12 plasmid carrying both the bla VIM-1 carbapenemase and the mcr-9 colistin resistance gene.

RevDate: 2021-07-06

Cheng C, Zhou W, Dong X, et al (2021)

Genomic Analysis of Delftia tsuruhatensis Strain TR1180 Isolated From A Patient From China With In4-Like Integron-Associated Antimicrobial Resistance.

Frontiers in cellular and infection microbiology, 11:663933.

Delftia tsuruhatensis has become an emerging pathogen in humans. There is scant information on the genomic characteristics of this microorganism. In this study, we determined the complete genome sequence of a clinical D. tsuruhatensis strain, TR1180, isolated from a sputum specimen of a female patient in China in 2019. Phylogenetic and average nucleotide identity analysis demonstrated that TR1180 is a member of D. tsuruhatensis. TR1180 exhibited resistance to β-lactam, aminoglycoside, tetracycline and sulphonamide antibiotics, but was susceptible to phenicols, fluoroquinolones and macrolides. Its genome is a single, circular chromosome measuring 6,711,018 bp in size. Whole-genome analysis identified 17 antibiotic resistance-related genes, which match the antimicrobial susceptibility profile of this strain, as well as 24 potential virulence factors and a number of metal resistance genes. Our data showed that Delftia possessed an open pan-genome and the genes in the core genome contributed to the pathogenicity and resistance of Delftia strains. Comparative genomics analysis of TR1180 with other publicly available genomes of Delftia showed diverse genomic features among these strains. D. tsuruhatensis TR1180 harbored a unique 38-kb genomic island flanked by a pair of 29-bp direct repeats with the insertion of a novel In4-like integron containing most of the specific antibiotic resistance genes within the genome. This study reports the findings of a fully sequenced genome from clinical D. tsuruhatensis, which provide researchers and clinicians with valuable insights into this uncommon species.

RevDate: 2021-07-06

Koeksoy E, Bezuidt OM, Bayer T, et al (2021)

Zetaproteobacteria Pan-Genome Reveals Candidate Gene Cluster for Twisted Stalk Biosynthesis and Export.

Frontiers in microbiology, 12:679409.

Twisted stalks are morphologically unique bacterial extracellular organo-metallic structures containing Fe(III) oxyhydroxides that are produced by microaerophilic Fe(II)-oxidizers belonging to the Betaproteobacteria and Zetaproteobacteria. Understanding the underlying genetic and physiological mechanisms of stalk formation is of great interest based on their potential as novel biogenic nanomaterials and their relevance as putative biomarkers for microbial Fe(II) oxidation on ancient Earth. Despite the recognition of these special biominerals for over 150 years, the genetic foundation for the stalk phenotype has remained unresolved. Here we present a candidate gene cluster for the biosynthesis and secretion of the stalk organic matrix that we identified with a trait-based analyses of a pan-genome comprising 16 Zetaproteobacteria isolate genomes. The "stalk formation in Zetaproteobacteria" (sfz) cluster comprises six genes (sfz1-sfz6), of which sfz1 and sfz2 were predicted with functions in exopolysaccharide synthesis, regulation, and export, sfz4 and sfz6 with functions in cell wall synthesis manipulation and carbohydrate hydrolysis, and sfz3 and sfz5 with unknown functions. The stalk-forming Betaproteobacteria Ferriphaselus R-1 and OYT-1, as well as dread-forming Zetaproteobacteria Mariprofundus aestuarium CP-5 and Mariprofundus ferrinatatus CP-8 contain distant sfz gene homologs, whereas stalk-less Zetaproteobacteria and Betaproteobacteria lack the entire gene cluster. Our pan-genome analysis further revealed a significant enrichment of clusters of orthologous groups (COGs) across all Zetaproteobacteria isolate genomes that are associated with the regulation of a switch between sessile and motile growth controlled by the intracellular signaling molecule c-di-GMP. Potential interactions between stalk-former unique transcription factor genes, sfz genes, and c-di-GMP point toward a c-di-GMP regulated surface attachment function of stalks during sessile growth.

RevDate: 2021-07-06

Farace PD, Irazoqui JM, Morsella CG, et al (2021)

Phylogenomic analysis for Campylobacter fetus ocurring in Argentina.

Veterinary world, 14(5):1165-1179.

Background and Aim: Campylobacter fetus is one of the most important pathogens that severely affects livestock industry worldwide. C. fetus mediated bovine genital campylobacteriosis infection in cattle has been associated with significant economic losses in livestock production in the Pampas region, the most productive area of Argentina. The present study aimed to establish the genomic relationships between C. fetus strains, isolated from the Pampas region, at local and global levels. The study also explored the utility of multi-locus sequence typing (MLST) as a typing technique for C. fetus.

Materials and Methods: For pangenome and phylogenetic analysis, whole genome sequences for 34 C. fetus strains, isolated from cattle in Argentina were downloaded from GenBank. A local maximum likelihood (ML) tree was constructed and linked to a Microreact project. In silico analysis based on MLST was used to obtain information regarding sequence type (ST) for each strain. For global phylogenetic analysis, a core genome ML-tree was constructed using genomic dataset for 265 C. fetus strains, isolated from various sources obtained from 20 countries.

Results: The local core genome phylogenetic tree analysis described the presence of two major clusters (A and B) and one minor cluster (C). The occurrence of 82% of the strains in these three clusters suggested a clonal population structure for C. fetus. The MLST analysis for the local strains revealed that 31 strains were ST4 type and one strain was ST5 type. In addition, a new variant was identified that was assigned a novel ST, ST70. In the present case, ST4 was homogenously distributed across all the regions and clusters. The global analysis showed that most of the local strains clustered in the phylogenetic groups that comprised exclusively of the strains isolated from Argentina. Interestingly, three strains showed a close genetic relationship with bovine strains obtained from Uruguay and Brazil. The ST5 strain grouped in a distant cluster, with strains obtained from different sources from various geographic locations worldwide. Two local strains clustered in a phylogenetic group comprising intercontinental Campylobacter fetus venerealis strains.

Conclusion: The results of the study suggested active movement of animals, probably due to economic trade between different regions of the country as well as with neighboring countries. MLST results were partially concordant with phylogenetic analysis. Thus, this method did not qualify as a reliable subtyping method to assess C. fetus diversity in Argentina. The present study provided a basic platform to conduct future research on C. fetus, both at local and international levels.

RevDate: 2021-07-03

Carpi FM, Coman MM, Silvi S, et al (2021)

Comprehensive Pan-Genome Analysis of Lactiplantibacillus plantarum Complete Genomes.

Journal of applied microbiology [Epub ahead of print].

AIMS: The aim of this work was to refine the taxonomy and the functional characterization of publicly available Lactiplantibacillus plantarum complete genomes through a pan-genome analysis. Particular attention was paid in depicting the probiotic potential of each strain.

METHODS AND RESULTS: Complete genome sequence of 127 L. plantarum strains, without detected anomalies, was downloaded from NCBI. Roary analysis of L. plantarum pan-genome identified 1,436 core, 414 soft core, 1,858 shell and 13,203 cloud genes, highlighting the "open" nature of L. plantarum pan-genome. Identification and characterization of plasmid content, mobile genetic elements, adaptative immune system and probiotic marker genes (PMGs) revealed unique features across all the L. plantarum strains included in the present study. Considering our updated list of PMGs, we determined that approximatively 70% of the PMGs belongs to the core/soft-core genome.

CONCLUSIONS: The comparative genomic analysis conducted in this study provide new insights into the genomic content and variability of L. plantarum.

This study provides a comprehensive pan-genome analysis of L. plantarum, including the largest number (N=127) of complete L. plantarum genomes retrieved from publicly available repositories. Our effort aimed to determine a solid reference panel for the future characterization of newly sequenced L. plantarum strains useful as probiotic supplements.

RevDate: 2021-07-02

Ge T, Jiang H, Tan EH, et al (2021)

Pangenomic Analysis of Dickeya dianthicola Strains Related to the Outbreak of Blackleg and Soft Rot of Potato in USA.

Plant disease [Epub ahead of print].

Dickeya dianthicola has caused an outbreak of blackleg and soft rot of potato in the eastern half of the USA since 2015. To investigate genetic diversity of the pathogen, a comparative analysis was conducted on genomes of D. dianthicola strains. Whole genomes of 16 strains from the USA outbreak were assembled and compared to 16 previously sequenced genomes of D. dianthicola isolated from potato or carnation. Among the 32 strains, eight distinct clades were distinguished based on phylogenomic analysis. The outbreak strains were grouped into three clades, with the majority of the strains in clade I. Clade I strains were unique and homogeneous, suggesting a recent incursion of this strain into potato production from alternative hosts or environmental sources. Pangenome of the 32 strains contained 6693 genes, 3377 of which were core genes. By screening primary protein subunits associated with virulence from all USA strains, we found many virulence-related gene clusters, such as plant cell wall degrading enzyme genes, flagellar and chemotaxis related genes, two-component regulatory genes, and type I/II/III secretion system genes were highly conserved but type IV and type VI secretion system genes varied. The virulent clade I strains encoded two clusters of type IV secretion systems, while clade II and III strains encoded only one cluster. Clade I and II strains encoded one more VgrG/PAAR spike protein than clade III. Thus, we predicted that the presence of additional virulence-related genes may have enabled the unique clade I strain to become predominant source in the USA outbreak.

RevDate: 2021-07-02

Pintado A, Pérez-Martínez I, Aragón IM, et al (2021)

The Rhizobacterium Pseudomonas alcaligenes AVO110 Induces the Expression of Biofilm-Related Genes in Response to Rosellinia necatrix Exudates.

Microorganisms, 9(7): pii:microorganisms9071388.

The rhizobacterium Pseudomonas alcaligenes AVO110 exhibits antagonism toward the phytopathogenic fungus Rosellinia necatrix. This strain efficiently colonizes R. necatrix hyphae and is able to feed on their exudates. Here, we report the complete genome sequence of P. alcaligenes AVO110. The phylogeny of all available P. alcaligenes genomes separates environmental isolates, including AVO110, from those obtained from infected human blood and oyster tissues, which cluster together with Pseudomonas otitidis. Core and pan-genome analyses showed that P. alcaligenes strains encode highly heterogenic gene pools, with the AVO110 genome encoding the largest and most exclusive variable region (~1.6 Mb, 1795 genes). The AVO110 singletons include a wide repertoire of genes related to biofilm formation, several of which are transcriptionally modulated by R. necatrix exudates. One of these genes (cmpA) encodes a GGDEF/EAL domain protein specific to Pseudomonas spp. strains isolated primarily from the rhizosphere of diverse plants, but also from soil and water samples. We also show that CmpA has a role in biofilm formation and that the integrity of its EAL domain is involved in this function. This study contributes to a better understanding of the niche-specific adaptations and lifestyles of P. alcaligenes, including the mycophagous behavior of strain AVO110.

RevDate: 2021-07-06

Alouane T, Rimbert H, Bormann J, et al (2021)

Comparative Genomics of Eight Fusarium graminearum Strains with Contrasting Aggressiveness Reveals an Expanded Open Pangenome and Extended Effector Content Signatures.

International journal of molecular sciences, 22(12):.

Fusarium graminearum, the primary cause of Fusarium head blight (FHB) in small-grain cereals, demonstrates remarkably variable levels of aggressiveness in its host, producing different infection dynamics and contrasted symptom severity. While the secreted proteins, including effectors, are thought to be one of the essential components of aggressiveness, our knowledge of the intra-species genomic diversity of F. graminearum is still limited. In this work, we sequenced eight European F. graminearum strains of contrasting aggressiveness to characterize their respective genome structure, their gene content and to delineate their specificities. By combining the available sequences of 12 other F. graminearum strains, we outlined a reference pangenome that expands the repertoire of the known genes in the reference PH-1 genome by 32%, including nearly 21,000 non-redundant sequences and gathering a common base of 9250 conserved core-genes. More than 1000 genes with high non-synonymous mutation rates may be under diverse selection, especially regarding the trichothecene biosynthesis gene cluster. About 900 secreted protein clusters (SPCs) have been described. Mostly localized in the fast sub-genome of F. graminearum supposed to evolve rapidly to promote adaptation and rapid responses to the host's infection, these SPCs gather a range of putative proteinaceous effectors systematically found in the core secretome, with the chloroplast and the plant nucleus as the main predicted targets in the host cell. This work describes new knowledge on the intra-species diversity in F. graminearum and emphasizes putative determinants of aggressiveness, providing a wealth of new candidate genes potentially involved in the Fusarium head blight disease.

RevDate: 2021-07-02

Ahmed O, Rossi M, Kovaka S, et al (2021)

Pan-genomic matching statistics for targeted nanopore sequencing.

iScience, 24(6):102696.

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.

RevDate: 2021-07-02

Li Y, Wang M, Sun ZZ, et al (2021)

Comparative Genomic Insights Into the Taxonomic Classification, Diversity, and Secondary Metabolic Potentials of Kitasatospora, a Genus Closely Related to Streptomyces.

Frontiers in microbiology, 12:683814.

While the genus Streptomyces (family Streptomycetaceae) has been studied as a model for bacterial secondary metabolism and genetics, its close relatives have been less studied. The genus Kitasatospora is the second largest genus in the family Streptomycetaceae. However, its taxonomic position within the family remains under debate and the secondary metabolic potential remains largely unclear. Here, we performed systematic comparative genomic and phylogenomic analyses of Kitasatospora. Firstly, the three genera within the family Streptomycetaceae (Kitasatospora, Streptomyces, and Streptacidiphilus) showed common genomic features, including high G + C contents, high secondary metabolic potentials, and high recombination frequencies. Secondly, phylogenomic and comparative genomic analyses revealed phylogenetic distinctions and genome content differences among these three genera, supporting Kitasatospora as a separate genus within the family. Lastly, the pan-genome analysis revealed extensive genetic diversity within the genus Kitasatospora, while functional annotation and genome content comparison suggested genomic differentiation among lineages. This study provided new insights into genomic characteristics of the genus Kitasatospora, and also uncovered its previously underestimated and complex secondary metabolism.

RevDate: 2021-07-09

Köstlbacher S, Collingro A, Halter T, et al (2021)

Pangenomics reveals alternative environmental lifestyles among chlamydiae.

Nature communications, 12(1):4021.

Chlamydiae are highly successful strictly intracellular bacteria associated with diverse eukaryotic hosts. Here we analyzed metagenome-assembled genomes of the "Genomes from Earth's Microbiomes" initiative from diverse environmental samples, which almost double the known phylogenetic diversity of the phylum and facilitate a highly resolved view at the chlamydial pangenome. Chlamydiae are defined by a relatively large core genome indicative of an intracellular lifestyle, and a highly dynamic accessory genome of environmental lineages. We observe chlamydial lineages that encode enzymes of the reductive tricarboxylic acid cycle and for light-driven ATP synthesis. We show a widespread potential for anaerobic energy generation through pyruvate fermentation or the arginine deiminase pathway, and we add lineages capable of molecular hydrogen production. Genome-informed analysis of environmental distribution revealed lineage-specific niches and a high abundance of chlamydiae in some habitats. Together, our data provide an extended perspective of the variability of chlamydial biology and the ecology of this phylum of intracellular microbes.

RevDate: 2021-07-01

Zhou Q, Mai K, Yang D, et al (2021)

Comparative genomic analysis of Mycoplasma anatis strains.

Genes & genomics [Epub ahead of print].

BACKGROUND: The Gram-negative intracellular bacterium Mycoplasma anatis is a pathogen of respiratory infectious diseases in ducks and has caused significant economic losses in the poultry industry.

OBJECTIVE: This study, as the first report of the structure and function of the pan-genome of Mycoplasma anatis, may provide a valuable genetic basis for many aspects of future research on the pathogens of waterfowl.

METHODS: We sequenced the whole genomes of 15 Mycoplasma anatis isolated from ducks in China. Draft genome sequencing was carried out and whole-genome sequencing was performed by the sequencers of the PacBio Sequel and an IonTorrent Personal Genome Machine (PGM). Then the common genic elements of protein-coding genes, tRNAs, and rRNAs of Mycoplasma anatis genomes were predicted by using the pipeline Prokka v1.13.7. To investigate homologous protein clusters across Mycoplasma anatis genomes, we adopted Roary v3.13.0 to cluster orthologous genes (OGs) based on the following criteria.

RESULTS: We obtained one complete genome and 14 genome sketches. Microbial mobile genetic element analysis revealed the distribution of insertion sequences (IS30, IS3, and IS1634), prophage regions, and CRISPR arrays in the genome of Mycoplasma anatis. Comparative genomic analysis decoded the genetic components and functional classification of the pan-genome of Mycoplasma anatis that comprised 646 core genes, 231 dispensable genes and among them 110 was strain-specific. Virulence-related gene profiles of Mycoplasma anatis were systematically identified, and the products of these genes included bacterial ABC transporter systems, iron transport proteins, toxins, and secretion systems.

CONCLUSION: A complete virulence-related gene profile of Mycoplasma anatis has been identified, most of the genes are highly conserved in all strains. Sequencing results are relevant to the molecular mechanisms of drug resistance, adaptive evolution of pathogens, population structure, and vaccine development.

RevDate: 2021-06-29

Tláskal V, Pylro VS, Žifčáková L, et al (2021)

Ecological Divergence Within the Enterobacterial Genus Sodalis: From Insect Symbionts to Inhabitants of Decomposing Deadwood.

Frontiers in microbiology, 12:668644.

The bacterial genus Sodalis is represented by insect endosymbionts as well as free-living species. While the former have been studied frequently, the distribution of the latter is not yet clear. Here, we present a description of a free-living strain, Sodalis ligni sp. nov., originating from decomposing deadwood. The favored occurrence of S. ligni in deadwood is confirmed by both 16S rRNA gene distribution and metagenome data. Pangenome analysis of available Sodalis genomes shows at least three groups within the Sodalis genus: deadwood-associated strains, tsetse fly endosymbionts and endosymbionts of other insects. This differentiation is consistent in terms of the gene frequency level, genome similarity and carbohydrate-active enzyme composition of the genomes. Deadwood-associated strains contain genes for active decomposition of biopolymers of plant and fungal origin and can utilize more diverse carbon sources than their symbiotic relatives. Deadwood-associated strains, but not other Sodalis strains, have the genetic potential to fix N2, and the corresponding genes are expressed in deadwood. Nitrogenase genes are located within the genomes of Sodalis, including S. ligni, at multiple loci represented by more gene variants. We show decomposing wood to be a previously undescribed habitat of the genus Sodalis that appears to show striking ecological divergence.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Zhao Y, Chen X, Hu X, et al (2021)

Characterization of a carbapenem-resistant Citrobacter amalonaticus coharbouring bla IMP-4 and qnrs1 genes.

Journal of medical microbiology, 70(6):.

Introduction. Members of the genus Citrobacter are facultative anaerobic Gram-negative bacilli belonging to the Enterobacterales [Janda J Clin Microbiol 1994; 32(8):1850-1854; Arens Clin Microbiol Infect 1997;3(1):53-57]. Formerly, Citrobacter species were occasionally reported as nosocomial pathogens with low virulence [Pepperell Antimicrob Agents Chemother 2002;46(11):3555-60]. Now, they are consistently reported to cause nosocomial infections of the urinary tract, respiratory tract, bone, peritoneum, endocardium, meninges, intestines, bloodstream and central nervous system. Among Citrobacter species, the most common isolates are C. koseri and C. freundii, while C. amalonaticus has seldom been isolated [Janda J Clin Microbiol 1994; 32(8):1850-1854; Marak Infect Dis (Lond) 2017;49(7):532-9]. Further, Citrobacter spp. are usually susceptible to carbapenems, aminoglycosides, tetracyclines and colistin [Marak Infect Dis (Lond) 2017;49(7):532-9].Hypothesis/Gap Statement. As C. amalonaticus is rare, only one clinical isolate, coharbouring carbapenem resistance gene bla IMP-4 and quinolone resistance gene qnrs1, has been reported.Aim. To characterize a carbapenem-resistant C. amalonaticus strain from PR China coharbouring bla IMP-4 and qnrs1.Methodology. Three hundred and forty nonrepetitive carbapenem-resistant Enterobacterales (CRE) strains were collected during 2011-2018. A carbapenem-resistant C. amalonaticus strain was detected and confirmed using a VITEK mass spectrometry-based microbial identification system and 16S rRNA sequencing. Minimum inhibitory concentrations (MICs) for clinical antimicrobials were obtained by the broth microdilution method. Whole-genome sequencing (WGS) was performed for antibiotic resistance gene analysis, and a phylogenetic tree of C. amalonaticus strains was constructed using the Bacterial Pan Genome Analysis (BPGA) tool. The transferability of the resistance plasmid was verified by conjugal transfer.Results. A rare carbapenem-resistant C. amalonaticus strain (CA71) was recovered from a patient with cerebral obstruction and the sequences of 16S rRNA gene shared more than 99 % similarity with C. amalonaticus CITRO86, FDAARGOS 165. CA71 is resistant to β-lactam, quinolone and aminoglycoside antibiotics, and even imipenem and meropenem (MICs of 2 and 4 mg l-1 respectively), and is only sensitive to polymyxin B and tigecycline. Six antibiotic resistance genes were detected via WGS, including the β-lactam genes bla IMP-4, bla CTX-M-18 and bla Sed1, the quinolone gene qnrs1, and the aminoglycoside genes AAC(3)-VIIIa, AadA24. Interestingly, bla IMP-4 and qnrs1 coexist on an IncN1-type plasmid (pCA71-IMP) and successfully transferred to Escherichia coli J53 via conjugal transfer. Phylogenetic analysis showed that CA71 is most similar to C. amalonaticus strain CJ25 and belongs to the same evolutionary cluster along with seven other strains.Conclusion. To the best of our knowledge, this is the first report of a carbapenem-resistant C. amalonaticus isolate coharbouring bla IMP-4 and qnrs1.

RevDate: 2021-06-25

Bayer PE, Valliyodan B, Hu H, et al (2021)

Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding.

The plant genome [Epub ahead of print].

The gene content of plants varies between individuals of the same species due to gene presence/absence variation, and selection can alter the frequency of specific genes in a population. Selection during domestication and breeding will modify the genomic landscape, though the nature of these modifications is only understood for specific genes or on a more general level (e.g., by a loss of genetic diversity). Here we have assembled and analyzed a soybean (Glycine spp.) pangenome representing more than 1,000 soybean accessions derived from the USDA Soybean Germplasm Collection, including both wild and cultivated lineages, to assess genomewide changes in gene and allele frequency during domestication and breeding. We identified 3,765 genes that are absent from the Lee reference genome assembly and assessed the presence/absence of all genes across this population. In addition to a loss of genetic diversity, we found a significant reduction in the average number of protein-coding genes per individual during domestication and subsequent breeding, though with some genes and allelic variants increasing in frequency associated with selection for agronomic traits. This analysis provides a genomic perspective of domestication and breeding in this important oilseed crop.

RevDate: 2021-07-03

Shahid F, Zaheer T, Ashraf ST, et al (2021)

Chimeric vaccine designs against Acinetobacter baumannii using pan genome and reverse vaccinology approaches.

Scientific reports, 11(1):13213.

Acinetobacter baumannii (A. baumannii), an opportunistic, gram-negative pathogen, has evoked the interest of the medical community throughout the world because of its ability to cause nosocomial infections, majorly infecting those in intensive care units. It has also drawn the attention of researchers due to its evolving immune evasion strategies and increased drug resistance. The emergence of multi-drug-resistant-strains has urged the need to explore novel therapeutic options as an alternative to antibiotics. Due to the upsurge in antibiotic resistance mechanisms exhibited by A. baumannii, the current therapeutic strategies are rendered less effective. The aim of this study is to explore novel therapeutic alternatives against A. baumannii to control the ailed infection. In this study, a computational framework is employed involving, pan genomics, subtractive proteomics and reverse vaccinology strategies to identify core promiscuous vaccine candidates. Two chimeric vaccine constructs having B-cell derived T-cell epitopes from prioritized vaccine candidates; APN, AdeK and AdeI have been designed and checked for their possible interactions with host BCR, TLRs and HLA Class I and II Superfamily alleles. These vaccine candidates can be experimentally validated and thus contribute to vaccine development against A. baumannii infections.

RevDate: 2021-06-25

Tenea GN, P Hurtado (2021)

Next-Generation Sequencing for Whole-Genome Characterization of Weissella cibaria UTNGt21O Strain Originated From Wild Solanum quitoense Lam. Fruits: An Atlas of Metabolites With Biotechnological Significance.

Frontiers in microbiology, 12:675002.

The whole genome of Weissella cibaria strain UTNGt21O isolated from wild fruits of Solanum quitoense (naranjilla) shrub was sequenced and annotated. The similarity proportions based on the genus level, as a result of the best hits for the entire contig, were 54.84% with Weissella, 6.45% with Leuconostoc, 3.23% with Lactococcus, and 35.48% no match. The closest genome was W. cibaria SP7 (GCF_004521965.1) with 86.21% average nucleotide identity (ANI) and 3.2% alignment coverage. The genome contains 1,867 protein-coding genes, among which 1,620 were assigned with the EggNOG database. On the basis of the results, 438 proteins were classified with unknown function from which 247 new hypothetical proteins have no match in the nucleotide Basic Local Alignment Search Tool (BLASTN) database. It also contains 78 tRNAs, six copies of 5S rRNA, one copy of 16S rRNA, one copy of 23S rRNA, and one copy of tmRNA. The W. cibaria UTNGt21O strain harbors several genes responsible for carbohydrate metabolism, cellular process, general stress responses, cofactors, and vitamins, conferring probiotic features. A pangenome analysis indicated the presence of various strain-specific genes encoded for proteins responsible for the defense mechanisms as well as gene encoded for enzymes with biotechnological value, such as penicillin acylase and folates; thus, W. cibaria exhibited high genetic diversity. The genome characterization indicated the presence of a putative CRISPR-Cas array and five prophage regions and the absence of acquired antibiotic resistance genes, virulence, and pathogenic factors; thus, UTNGt21O might be considered a safe strain. Besides, the interaction between the peptide extracts from UTNGt21O and Staphylococcus aureus results in cell death caused by the target cell integrity loss and the release of aromatic molecules from the cytoplasm. The results indicated that W. cibaria UTNGt21O can be considered a beneficial strain to be further exploited for developing novel antimicrobials and probiotic products with improved technological characteristics.

RevDate: 2021-06-25

Lawal OU, Barata M, Fraqueza MJ, et al (2021)

Staphylococcus saprophyticus From Clinical and Environmental Origins Have Distinct Biofilm Composition.

Frontiers in microbiology, 12:663768.

Biofilm formation has been shown to be critical to the success of uropathogens. Although Staphylococcus saprophyticus is a common cause of urinary tract infections, its biofilm production capacity, composition, genetic basis, and origin are poorly understood. We investigated biofilm formation in a large and diverse collection of S. saprophyticus (n = 422). Biofilm matrix composition was assessed in representative strains (n = 63) belonging to two main S. saprophyticus lineages (G and S) recovered from human infection, colonization, and food-related environment using biofilm detachment approach. To identify factors that could be associated with biofilm formation and structure variation, we used a pangenome-wide association study approach. Almost all the isolates (91%; n = 384/422) produced biofilm. Among the 63 representative strains, we identified eight biofilm matrix phenotypes, but the most common were composed of protein or protein-extracellular DNA (eDNA)-polysaccharides (38%, 24/63 each). Biofilms containing protein-eDNA-polysaccharides were linked to lineage G and environmental isolates, whereas protein-based biofilms were produced by lineage S and infection isolates (p < 0.05). Putative biofilm-associated genes, namely, aas, atl, ebpS, uafA, sasF, sasD, sdrH, splE, sdrE, sdrC, sraP, and ica genes, were found with different frequencies (3-100%), but there was no correlation between their presence and biofilm production or matrix types. Notably, icaC_1 was ubiquitous in the collection, while icaR was lineage G-associated, and only four strains carried a complete ica gene cluster (icaADBCR) except one that was without icaR. We provided evidence, using a comparative genomic approach, that the complete icaADBCR cluster was acquired multiple times by S. saprophyticus and originated from other coagulase-negative staphylococci. Overall, the composition of S. saprophyticus biofilms was distinct in environmental and clinical isolates, suggesting that modulation of biofilm structure could be a key step in the pathogenicity of these bacteria. Moreover, biofilm production in S. saprophyticus is ica-independent, and the complete icaADBCR was acquired from other staphylococci.

RevDate: 2021-06-24

Zhang S, Amanze C, Sun C, et al (2021)

Evolutionary, genomic, and biogeographic characterization of two novel xenobiotics-degrading strains affiliated with Dechloromonas.

Heliyon, 7(6):e07181.

Xenobiotics are generally known as man-made refractory organic pollutants widely distributed in various environments. For exploring the bioremediation possibility of xenobiotics, two novel xenobiotics-degrading strains affiliated with Azonexaceae were isolated. We report here the phylogenetics, genome, and geo-distribution of a novel and ubiquitous Azonexaceae species that primarily joins in the cometabolic process of some xenobiotics in natural communities. Strains s22 and t15 could be proposed as a novel species within Dechloromonas based on genomic and multi-phylogenetic analysis. Pan-genome analysis showed that the 63 core genes in Dechloromonas include genes for dozens of metabolisms such as nitrogen fixation protein (nifU), nitrogen regulatory protein (glnK), dCTP deaminase, C4-dicarboxylate transporter, and fructose-bisphosphate aldolase. Strains s22 and t15 have the ability to metabolize nitrogen, including nitrogen fixation, NirS-dependent denitrification, and dissimilatory nitrate reduction. Moreover, the novel species possesses the EnvZ-OmpR two-component system for controlling osmotic stress and QseC-QseB system for quorum sensing to rapidly sense environmental changes. It is intriguing that this new species has a series of genes for the biodegradation of some xenobiotics such as azathioprine, 6-Mercaptopurine, trinitrotoluene, chloroalkane, and chloroalkene. Specifically, glutathione S-transferase (GST) and 4-oxalocrotonate tautomerase (praC) in this novel species play important roles in the detoxification metabolism of some xenobiotics like dioxin, trichloroethene, chloroacetyl chloride, benzo[a]pyrene, and aflatoxin B1. Using data from GenBank, DDBJ and EMBL databases, we also demonstrated that members of this novel species were found globally in plants (e.g. rice), guts (e.g. insect), pristine and contaminated regions. Given these data, Dechloromonas sp. strains s22 and t15 take part in the biodegradation of some xenobiotics through key enzymes.

RevDate: 2021-06-22

Sahmi-Bounsiar D, Rolland C, Aherfi S, et al (2021)

Marseilleviruses: An Update in 2021.

Frontiers in microbiology, 12:648731.

The family Marseilleviridae was the second family of giant viruses that was described in 2013, after the family Mimiviridae. Marseillevirus marseillevirus, isolated in 2007 by coculture on Acanthamoeba polyphaga, is the prototype member of this family. Afterward, the worldwide distribution of marseilleviruses was revealed through their isolation from samples of various types and sources. Thus, 62 were isolated from environmental water, one from soil, one from a dipteran, one from mussels, and two from asymptomatic humans, which led to the description of 67 marseillevirus isolates, including 21 by the IHU Méditerranée Infection in France. Recently, five marseillevirus genomes were assembled from deep sea sediment in Norway. Isolated marseilleviruses have ≈250 nm long icosahedral capsids and 348-404 kilobase long mosaic genomes that encode 386-545 predicted proteins. Comparative genomic analyses indicate that the family Marseilleviridae includes five lineages and possesses a pangenome composed of 3,082 clusters of genes. The detection of marseilleviruses in both symptomatic and asymptomatic humans in stool, blood, and lymph nodes, and an up-to-30-day persistence of marseillevirus in rats and mice, raise questions concerning their possible clinical significance that are still under investigation.

RevDate: 2021-06-19

Ruperao P, Thirunavukkarasu N, Gandham P, et al (2021)

Sorghum Pan-Genome Explores the Functional Utility for Genomic-Assisted Breeding to Accelerate the Genetic Gain.

Frontiers in plant science, 12:666342.

Sorghum (Sorghum bicolor L.) is a staple food crops in the arid and rainfed production ecologies. Sorghum plays a critical role in resilient farming and is projected as a smart crop to overcome the food and nutritional insecurity in the developing world. The development and characterisation of the sorghum pan-genome will provide insight into genome diversity and functionality, supporting sorghum improvement. We built a sorghum pan-genome using reference genomes as well as 354 genetically diverse sorghum accessions belonging to different races. We explored the structural and functional characteristics of the pan-genome and explain its utility in supporting genetic gain. The newly-developed pan-genome has a total of 35,719 genes, a core genome of 16,821 genes and an average of 32,795 genes in each cultivar. The variable genes are enriched with environment responsive genes and classify the sorghum accessions according to their race. We show that 53% of genes display presence-absence variation, and some of these variable genes are predicted to be functionally associated with drought adaptation traits. Using more than two million SNPs from the pan-genome, association analysis identified 398 SNPs significantly associated with important agronomic traits, of which, 92 were in genes. Drought gene expression analysis identified 1,788 genes that are functionally linked to different conditions, of which 79 were absent from the reference genome assembly. This study provides comprehensive genomic diversity resources in sorghum which can be used in genome assisted crop improvement.

RevDate: 2021-06-19

Zheng L, Zhu LW, Jing J, et al (2021)

Pan-Genome Analysis of Vibrio cholerae and Vibrio metschnikovii Strains Isolated From Migratory Birds at Dali Nouer Lake in Chifeng, China.

Frontiers in veterinary science, 8:638820.

Migratory birds are recently recognized as Vibrio disease vectors, but may be widespread transporters of Vibrio strains. We isolated Vibrio cholerae (V. cholerae) and Vibrio metschnikovii (V. metschnikovii) strains from migratory bird epidemic samples from 2017 to 2018 and isolated V. metschnikovii from migratory bird feces in 2019 from bird samples taken from the Inner Mongolia autonomous region of China. To investigate the evolution of these two Vibrio species, we sequenced the genomes of 40 V. cholerae strains and 34 V. metschnikovii strains isolated from the bird samples and compared these genomes with reference strain genomes. The pan-genome of all V. cholerae and V. metschnikovii genomes was large, with strains exhibiting considerable individual differences. A total of 2,130 and 1,352 core genes were identified in the V. cholerae and V. metschnikovii genomes, respectively, while dispensable genes accounted for 16,180 and 9,178 of all genes for the two strains, respectively. All V. cholerae strains isolated from the migratory birds that encoded T6SS and hlyA were non-O1/O139 serotypes without the ability to produce CTX. These strains also lacked the ability to produce the TCP fimbriae nor the extracellular matrix protein RbmA and could not metabolize trimetlylamine oxide (TMAO). Thus, these characteristics render them unlikely to be pandemic-inducing strains. However, a V. metschnikovii isolate encoding the complete T6SS system was isolated for the first time. These data provide new molecular insights into the diversity of V. cholerae and V. metschnikovii isolates recovered from migratory birds.

RevDate: 2021-06-16

N'Guessan A, Brito IL, Serohijos AWR, et al (2021)

Mobile gene sequence evolution within individual human gut microbiomes is better explained by gene-specific than host-specific selective pressures.

Genome biology and evolution pii:6300526 [Epub ahead of print].

Pangenomes-the cumulative set of genes encoded by a population or species-arise from the interplay of horizontal gene transfer, drift, and selection. The balance of these forces in shaping pangenomes has been debated, and studies to date focused on ancient evolutionary time scales have suggested that pangenomes generally confer niche adaptation to their bacterial hosts. To shed light on pangenome evolution on shorter evolutionary time scales, we inferred the selective pressures acting on mobile genes within individual human microbiomes from 176 Fiji islanders. We mapped metagenomic sequence reads to a set of known mobile genes to identify single nucleotide variants (SNVs) and calculated population genetic metrics to infer deviations from a neutral evolutionary model. We found that mobile gene sequence evolution varied more by gene family than by human social attributes, such as household or village. Patterns of mobile gene sequence evolution could be qualitatively recapitulated with a simple evolutionary simulation without the need to invoke adaptive value of mobile genes to either bacterial or human hosts. These results stand in contrast with the apparent adaptive value of pangenomes over longer evolutionary time scales. In general, the most highly mobile genes (i.e. those present in more distinct bacterial host genomes) tend to have higher metagenomic read coverage and an excess of low-frequency SNVs, consistent with their rapid spread across multiple bacterial species in the gut. However, a subset of mobile genes- including those involved in defense mechanisms and secondary metabolism-showed a contrasting signature of intermediate-frequency SNVs, indicating species-specific selective pressures or negative frequency-dependent selection on these genes. Together, our evolutionary models and population genetic data show that gene-specific selective pressures predominate over human or bacterial host-specific pressures during the relatively short time scales of a human lifetime.

RevDate: 2021-06-15

Huang X, Yang X, Shi X, et al (2021)

Whole-genome sequencing analysis of uncommon Shiga toxin-producing Escherichia coli from cattle: Virulence gene profiles, antimicrobial resistance predictions, and identification of novel O-serogroups.

Food microbiology, 99:103821.

Shiga toxin-producing E. coli (STEC) are major foodborne pathogens. While many studies have focused on the "top-7 STEC", little is known for minor serogroups. A total of 284 non-top-7 STEC strains isolated from cattle feces were subjected to whole-genome sequencing (WGS) to determine the serotypes, the presence of virulence genes and antimicrobial resistance (AMR) determinants. Nineteen typeable and three non-typeable serotypes with novel O-antigen loci were identified. Twenty-one AMR genes and point mutations in another six genes that conferred resistance to 10 antimicrobial classes were detected, as well as 46 virulence genes. The distribution of 33 virulence genes and 15 AMR determinants exhibited significant differences among serotypes (p < 0.05). Among all strains, 81.7% (n = 232) and 14.1% (n = 40) carried stx2 and stx1 only, respectively; only 4.2% (n = 12) carried both. Subtypes stx1a, stx1c, stx2a, stx2c, stx2d, and stx2g were identified. Forty-six strains carried eae and stx2a and therefore had the potential cause severe diseases; 47 strains were genetically related to human clinical strains inferred from a pan-genome phylogenetic tree. We were able to demonstrate the utility of WGS as a surveillance tool to characterize the novel serotypes, as well as AMR and virulence profiles of uncommon STEC that could potentially cause human illness.

RevDate: 2021-06-12

Sutton G, Fogel GB, Abramson B, et al (2021)

A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes.

F1000Research, 10:286.

Background: Synthetic engineering of bacteria to produce industrial products is a burgeoning field of research and application. In order to optimize genome design, designers need to understand which genes are essential, which are optimal for growth, and locations in the genome that will be tolerated by the organism when inserting engineered cassettes. Methods: We present a pan-genome based method for the identification of core regions in a genome that are strongly conserved at the species level. Results: We show that the core regions determined by our method contain all or almost all essential genes. This demonstrates the accuracy of our method as essential genes should be core genes. We show that we outperform previous methods by this measure. We also explain why there are exceptions to this rule for our method. Conclusions: We assert that synthetic engineers should avoid deleting or inserting into these core regions unless they understand and are manipulating the function of the genes in that region. Similarly, if the designer wishes to streamline the genome, non-core regions and in particular low penetrance genes would be good targets for deletion. Care should be taken to remove entire cassettes with similar penetrance of the genes within cassettes as they may harbor toxin/antitoxin genes which need to be removed in tandem. The bioinformatic approach introduced here saves considerable time and effort relative to knockout studies on single isolates of a given species and captures a broad understanding of the conservation of genes that are core to a species.

RevDate: 2021-07-07

Panibe JP, Wang L, Li J, et al (2021)

Chromosomal-level genome assembly of the semi-dwarf rice Taichung Native 1, an initiator of Green Revolution.

Genomics, 113(4):2656-2674.

Here we report the 409.5 Mb chromosome-level assembly of the first bred semi-dwarf rice, the Taichung Native 1 (TN1), which served as the template for the development of the Green Revolution (GR) cultivar IR8 "miracle rice". We sequenced the TN1 genome utilizing multiple platforms and produced PacBio long reads, Illumina paired-end reads, Illumina mate-pair reads and 10x Genomics linked reads. We used a hybrid approach to assemble the 226× coverage of sequences by a combination of de novo and reference-guided approaches. The assembled TN1 genome has an N50 scaffold size of 33.1 Mb with the longest measuring 45.5 Mb. We annotated 37,526 genes, in which 24,102 (64.23%) were assigned Blast2GO annotations. The genome has 4672 or 95.4% complete BUSCOs and a repeat content of 51.52%. We developed our own method of creating a GR pangenome using the orthologous relationships of the proteins of TN1, IR8, MH63 and IR64, identifying 16,999 core orthologue groups of Green Revolution. From the pangenome, we identified a set of shared and unique gene ontology terms for the accessory clusters, characterizing TN1, IR8, MH63 and IR64. This TN1 genome assembly and GR pangenome will be a resource for new genomic discoveries about Green Revolution, and for improving the disease and insect resistances and the yield of rice.

RevDate: 2021-06-11

Sserwadda I, G Mboowa (2021)

rMAP: the Rapid Microbial Analysis Pipeline for ESKAPE bacterial group whole-genome sequence data.

Microbial genomics, 7(6):.

The recent re-emergence of multidrug-resistant pathogens has exacerbated their threat to worldwide public health. The evolution of the genomics era has led to the generation of huge volumes of sequencing data at an unprecedented rate due to the ever-reducing costs of whole-genome sequencing (WGS). We have developed the Rapid Microbial Analysis Pipeline (rMAP), a user-friendly pipeline capable of profiling the resistomes of ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa and Enterobacter species) using WGS data generated from Illumina's sequencing platforms. rMAP is designed for individuals with little bioinformatics expertise, and automates the steps required for WGS analysis directly from the raw genomic sequence data, including adapter and low-quality sequence read trimming, de novo genome assembly, genome annotation, single-nucleotide polymorphism (SNP) variant calling, phylogenetic inference by maximum likelihood, antimicrobial resistance (AMR) profiling, plasmid profiling, virulence factor determination, multi-locus sequence typing (MLST), pangenome analysis and insertion sequence characterization (IS). Once the analysis is finished, rMAP generates an interactive web-like html report. rMAP installation is very simple, it can be run using very simple commands. It represents a rapid and easy way to perform comprehensive bacterial WGS analysis using a personal laptop in low-income settings where high-performance computing infrastructure is limited.

RevDate: 2021-06-15
CmpDate: 2021-06-15

Zhang J, Hewitt TC, Boshoff WHP, et al (2021)

A recombined Sr26 and Sr61 disease resistance gene stack in wheat encodes unrelated NLR genes.

Nature communications, 12(1):3378.

The re-emergence of stem rust on wheat in Europe and Africa is reinforcing the ongoing need for durable resistance gene deployment. Here, we isolate from wheat, Sr26 and Sr61, with both genes independently introduced as alien chromosome introgressions from tall wheat grass (Thinopyrum ponticum). Mutational genomics and targeted exome capture identify Sr26 and Sr61 as separate single genes that encode unrelated (34.8%) nucleotide binding site leucine rich repeat proteins. Sr26 and Sr61 are each validated by transgenic complementation using endogenous and/or heterologous promoter sequences. Sr61 orthologs are absent from current Thinopyrum elongatum and wheat pan genome sequences, contrasting with Sr26 where homologues are present. Using gene-specific markers, we validate the presence of both genes on a single recombinant alien segment developed in wheat. The co-location of these genes on a small non-recombinogenic segment simplifies their deployment as a gene stack and potentially enhances their resistance durability.

RevDate: 2021-06-03
CmpDate: 2021-06-03

Dall'Agnol B, Webster A, Souza UA, et al (2021)

Genomic analysis on Brazilian strains of Anaplasma marginale.

Revista brasileira de parasitologia veterinaria = Brazilian journal of veterinary parasitology : Orgao Oficial do Colegio Brasileiro de Parasitologia Veterinaria, 30(2):e000421 pii:S1984-29612021000200310.

Anaplasma marginale is a vector-borne pathogen that causes a disease known as anaplasmosis. No sequenced genomes of Brazilian strains are yet available. The aim of this work was to compare whole genomes of Brazilian strains of A. marginale (Palmeira and Jaboticabal) with genomes of strains from other regions (USA and Australia strains). Genome sequencing of Brazilian strains was performed by means of next-generation sequencing. Reads were mapped using the genome of the Florida strain of A. marginale as a reference sequence. Single nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs) were identified. The data showed that two Brazilian strains grouped together in one particular clade, which grouped in a larger American group together with North American strains. Moreover, some important differences in surface proteins between the two Brazilian isolates can be discerned. These results shed light on the evolutionary history of A. marginale and provide the first genome information on South American isolates. Assessing the genome sequences of strains from different regions is essential for increasing knowledge of the pan-genome of this bacteria.

RevDate: 2021-07-10

Xiao Y, Wang C, Zhao J, et al (2021)

Quantitative Detection of Bifidobacterium longum Strains in Feces Using Strain-Specific Primers.

Microorganisms, 9(6):.

We adopted a bioinformatics-based technique to identify strain-specific markers, which were then used to quantify the abundances of three distinct B. longum sup. longum strains in fecal samples of humans and mice. A pangenome analysis of 205 B. longum sup. longum genomes revealed the accumulation of considerable strain-specific genes within this species; specifically, 28.7% of the total identified genes were strain-specific. We identified 32, 14, and 49 genes specific to B. longum sup. longum RG4-1, B. longum sup. longum M1-20-R01-3, and B. longum sup. longum FGSZY6M4, respectively. After performing an in silico validation of these strain-specific markers using a nucleotide BLAST against both the B. longum sup. longum genome database and an NR/NT database, RG4-1_01874 (1331 bp), M1-20-R01-3_00324 (1745 bp), and FGSZY6M4_01477 (1691 bp) were chosen as target genes for strain-specific quantification. The specificities of the qPCR primers were validated against 47 non-target microorganisms and fecal baseline microbiota to ensure that they produced no PCR amplification products. The performance of the qPCR primer-based analysis was further assessed using fecal samples. After oral administration, the target B. longum strains appeared to efficiently colonize both the human and mouse guts, with average population levels of >108 CFU/g feces. The bioinformatics pipeline proposed here can be applied to the quantification of various bacterial species.

RevDate: 2021-06-15

Rodrigues DLN, Morais-Rodrigues F, Hurtado R, et al (2021)

Pan-Resistome Insights into the Multidrug Resistance of Acinetobacter baumannii.

Antibiotics (Basel, Switzerland), 10(5):.

Acinetobacter baumannii is an important Gram-negative opportunistic pathogen that is responsible for many nosocomial infections. This etiologic agent has acquired, over the years, multiple mechanisms of resistance to a wide range of antimicrobials and the ability to survive in different environments. In this context, our study aims to elucidate the resistome from the A. baumannii strains based on phylogenetic, phylogenomic, and comparative genomics analyses. In silico analysis of the complete genomes of A. baumannii strains was carried out to identify genes involved in the resistance mechanisms and the phylogenetic relationships and grouping of the strains based on the sequence type. The presence of genomic islands containing most of the resistance gene repertoire indicated high genomic plasticity, which probably enabled the acquisition of resistance genes and the formation of a robust resistome. A. baumannii displayed an open pan-genome and revealed a still constant genetic permutation among their strains. Furthermore, the resistance genes suggest a specific profile within the species throughout its evolutionary history. Moreover, the current study performed screening and characterization of the main genes present in the resistome, which can be used in applied research to develop new therapeutic methods to control this important bacterial pathogen.

RevDate: 2021-06-15

Reyes-Cortes JL, Azaola-Espinosa A, Lozano-Aguirre L, et al (2021)

Physiological and Genomic Analysis of Bacillus pumilus UAMX Isolated from the Gastrointestinal Tract of Overweight Individuals.

Microorganisms, 9(5):.

The study aimed to evaluate the metabolism and resistance to the gastrointestinal tract conditions of Bacillus pumilus UAMX (BP-UAMX) isolated from overweight individuals using genomic tools. Specifically, we assessed its ability to metabolize various carbon sources, its resistance to low pH exposure, and its growth in the presence of bile salts. The genomic and bioinformatic analyses included the prediction of gene and protein metabolic functions, a pan-genome and phylogenomic analysis. BP-UAMX survived at pH 3, while bile salts (0.2-0.3% w/v) increased its growth rate. Moreover, it showed the ability to metabolize simple and complex carbon sources (glucose, starch, carboxymethyl-cellulose, inulin, and tributyrin), showing a differentiated electrophoretic profile. Genome was assembled into a single contig, with a high percentage of genes and proteins associated with the metabolism of amino acids, carbohydrates, and lipids. Antibiotic resistance genes were detected, but only one beta-Lactam resistance protein related to the inhibition of peptidoglycan biosynthesis was identified. The pan-genome of BP-UAMX is still open with phylogenetic similarities with other Bacillus of human origin. Therefore, BP-UAMX seems to be adapted to the intestinal environment, with physiological and genomic analyses demonstrating the ability to metabolize complex carbon sources, the strain has an open pan-genome with continuous evolution and adaptation.

RevDate: 2021-07-10

Lee HH, Park J, Jung H, et al (2021)

Pan-Genome Analysis Reveals Host-Specific Functional Divergences in Burkholderia gladioli.

Microorganisms, 9(6):.

Burkholderia gladioli has high versatility and adaptability to various ecological niches. Here, we constructed a pan-genome using 14 genome sequences of B. gladioli, which originate from different niches, including gladiolus, rice, humans, and nature. Functional roles of core and niche-associated genomes were investigated by pathway enrichment analyses. Consequently, we inferred the uniquely important role of niche-associated genomes in (1) selenium availability during competition with gladiolus host; (2) aromatic compound degradation in seed-borne and crude oil-accumulated environments, and (3) stress-induced DNA repair system/recombination in the cystic fibrosis-niche. We also identified the conservation of the rhizomide biosynthetic gene cluster in all the B. gladioli strains and the concentrated distribution of this cluster in human isolates. It was confirmed the absence of complete CRISPR/Cas system in both plant and human pathogenic B. gladioli and the presence of the system in B. gladioli living in nature, possibly reflecting the inverse relationship between CRISPR/Cas system and virulence.

RevDate: 2021-06-15

Muslu T, Biyiklioglu-Kaya S, Akpinar BA, et al (2021)

Pan-Genome miRNomics in Brachypodium.

Plants (Basel, Switzerland), 10(5):.

Pan-genomes are efficient tools for the identification of conserved and varying genomic sequences within lineages of a species. Investigating genetic variations might lead to the discovery of genes present in a subset of lineages, which might contribute into beneficial agronomic traits such as stress resistance or yield. The content of varying genomic regions in the pan-genome could include protein-coding genes as well as microRNA(miRNAs), small non-coding RNAs playing key roles in the regulation of gene expression. In this study, we performed in silico miRNA identification from the genomic sequences of 54 lineages of Brachypodium distachyon, aiming to explore varying miRNA contents and their functional interactions. A total of 115 miRNA families were identified in 54 lineages, 56 of which were found to be present in all lineages. The miRNA families were classified based on their conservation among lineages and potential mRNA targets were identified. Obtaining information about regulatory mechanisms stemming from these miRNAs offers strong potential to provide a better insight into the complex traits that were potentially present in some lineages. Future work could lead us to introduce these traits to different lineages or other economically important plant species in order to promote their survival in different environmental conditions.

RevDate: 2021-06-04

Cai X, Chang L, Zhang T, et al (2021)

Impacts of allopolyploidization and structural variation on intraspecific diversification in Brassica rapa.

Genome biology, 22(1):166.

BACKGROUND: Despite the prevalence and recurrence of polyploidization in the speciation of flowering plants, its impacts on crop intraspecific genome diversification are largely unknown. Brassica rapa is a mesopolyploid species that is domesticated into many subspecies with distinctive morphotypes.

RESULTS: Herein, we report the consequences of the whole-genome triplication (WGT) on intraspecific diversification using a pan-genome analysis of 16 de novo assembled and two reported genomes. Among the genes that derive from WGT, 13.42% of polyploidy-derived genes accumulate more transposable elements and non-synonymous mutations than other genes during individual genome evolution. We denote such genes as being "flexible." We construct the Brassica rapa ancestral genome and observe the continuing influence of the dominant subgenome on intraspecific diversification in B. rapa. The gene flexibility is biased to the more fractionated subgenomes (MFs), in contrast to the more intact gene content of the dominant LF (least fractionated) subgenome. Furthermore, polyploidy-derived flexible syntenic genes are implicated in the response to stimulus and the phytohormone auxin; this may reflect adaptation to the environment. Using an integrated graph-based genome, we investigate the structural variation (SV) landscapes in 524 B. rapa genomes. We observe that SVs track morphotype domestication. Four out of 266 candidate genes for Chinese cabbage domestication are speculated to be involved in the leafy head formation.

CONCLUSIONS: This pan-genome uncovers the possible contributions of allopolyploidization on intraspecific diversification and the possible and underexplored role of SVs in favorable trait domestication. Collectively, our work serves as a rich resource for genome-based B. rapa improvement.

RevDate: 2021-06-01

Lomsadze A, Bonny C, Strozzi F, et al (2021)

GeneMark-HM: improving gene prediction in DNA sequences of human microbiome.

NAR genomics and bioinformatics, 3(2):lqab047.

Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools.

RevDate: 2021-06-01

Pavlovikj N, Gomes-Neto JC, Deogun JS, et al (2021)

ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses.

PeerJ, 9:e11376.

Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.

RevDate: 2021-06-28

Qin P, Lu H, Du H, et al (2021)

Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations.

Cell, 184(13):3542-3558.e16.

Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.

RevDate: 2021-06-11

Silva de Oliveira M, Thyeska Castro Alves J, Henrique Caracciolo Gomes de Sá P, et al (2021)

PAN2HGENE-tool for comparative analysis and identifying new gene products.

PloS one, 16(5):e0252414.

Advances in next-generation sequencing (NGS) platforms have had a positive impact on biological research, leading to the development of numerous omics approaches, including genomics, transcriptomics, metagenomics, and pangenomics. These analyses provide insights into the gene contents of various organisms. However, to understand the evolutionary processes of these genes, comparative analysis, which is an important tool for annotation, is required. Using comparative analysis, it is possible to infer the functions of gene contents and identify orthologs and paralogous genes via their homology. Although several comparative analysis tools currently exist, most of them are limited to complete genomes. PAN2HGENE, a computational tool that allows identification of gene products missing from the original genome sequence, with automated comparative analysis for both complete and draft genomes, can be used to address this limitation. In this study, PAN2HGENE was used to identify new products, resulting in altering the alpha value behavior in the pangenome without altering the original genomic sequence. Our findings indicate that this tool represents an efficient alternative for comparative analysis, with a simple and intuitive graphical interface. The PAN2HGENE have been uploaded to SourceForge and are available via: https://sourceforge.net/projects/pan2hgene-software.

RevDate: 2021-05-31

Dar HA, Ismail S, Waheed Y, et al (2021)

Designing a multi-epitope vaccine against Mycobacteroides abscessus by pangenome-reverse vaccinology.

Scientific reports, 11(1):11197.

Mycobacteroides abscessus (Previously Mycobacterium abscessus) is an emerging microorganism of the newly defined genera Mycobacteroides that causes mainly skin and tissue diseases in humans. The recent availability of total 34 fully sequenced genomes of different strains belonging to this species has provided an opportunity to utilize this genomics data to gain novel insights and guide the development of specific antimicrobial therapies. In the present study, we collected collectively 34 complete genome sequences of M. abscessus from the NCBI GenBank database. Pangenome analysis was conducted on these genomes to understand the genetic diversity and to obtain proteins associated with its core genome. These core proteins were then subjected to various subtractive filters to identify potential antigenic targets that were subjected to multi-epitope vaccine design. Our analysis projected the open pangenome of M. abscessus containing 3443 core genes. After applying various stepwise filtration steps on the core proteins, a total of four potential antigenic targets were identified. Utilizing their constituent CD4 and CD8 T-cell epitopes, a multi-epitope based subunit vaccine was computationally designed. Sequence-based analysis as well as structural characterization revealed the immunological effectiveness of this designed vaccine. Further molecular docking, molecular dynamics simulation and binding free energy estimation with Toll-like receptor 2 indicated strong structural associations of the vaccine with the immune receptor. The promising results are encouraging and need to be validated by additional wet laboratory studies for confirmation.

RevDate: 2021-06-02
CmpDate: 2021-05-31

Guo J, Pang E, Song H, et al (2021)

A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes.

BMC bioinformatics, 22(1):282.

BACKGROUND: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge.

RESULTS: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure.

CONCLUSIONS: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB .

RevDate: 2021-07-06

Simar SR, Hanson BM, CA Arias (2021)

Techniques in bacterial strain typing: past, present, and future.

Current opinion in infectious diseases, 34(4):339-345.

PURPOSE OF REVIEW: The advancement of molecular techniques such as whole-genome sequencing (WGS) has revolutionized the field of bacterial strain typing, with important implications for epidemiological surveillance and outbreak investigations. This review summarizes state-of-the-art techniques in strain typing and examines barriers faced by clinical and public health laboratories in implementing these new methodologies.

RECENT FINDINGS: WGS-based methodologies are on track to become the new 'gold standards' in bacterial strain typing, replacing traditional methods like pulsed-field gel electrophoresis and multilocus sequence typing. These new techniques have an improved ability to identify genetic relationships among organisms of interest. Further, advances in long-read sequencing approaches will likely provide a highly discriminatory tool to perform pangenome analyses and characterize relevant accessory genome elements, including mobile genetic elements carrying antibiotic resistance determinants in real time. Barriers to widespread integration of these approaches include a lack of standardized workflows and technical training.

SUMMARY: Genomic bacterial strain typing has facilitated a paradigm shift in clinical and molecular epidemiology. The increased resolution that these new techniques provide, along with epidemiological data, will facilitate the rapid identification of transmission routes with high confidence, leading to timely and effective deployment of infection control and public health interventions in outbreak settings.

RevDate: 2021-07-06

Schörner MA, Passarelli-Araujo H, Scheffer MC, et al (2021)

Genomic analysis of Neisseria elongata isolate from a patient with infective endocarditis.

FEBS open bio, 11(7):1987-1996.

Neisseria elongata is part of the commensal microbiota of the oropharynx. Although it is not considered pathogenic to humans, N. elongata has been implicated in several cases of infective endocarditis (IE). Here, we report a case of IE caused by N. elongata subsp. nitroreducens (Nel_M001) and compare its genome with 17 N. elongata genomes available in GenBank. We also evaluated resistance and virulence profiles with Comprehensive Antibiotic Resistance and Virulence Finder databases. The results showed a wide diversity among N. elongata isolates. Based on the pangenome cumulative curve, we demonstrate that N. elongata has an open pangenome. We found several different resistance genes, mainly associated with antibiotic efflux pumps. A wide range of virulence genes was observed, predominantly pilus formation genes. Nel_M001 was the only isolate to present two copies of some pilus genes and not present nspA gene. Together, our results provide insights into how this commensal microorganism can cause IE and may assist further biological investigations on nonpathogenic Neisseria spp. Case reporting and pangenome analyses are critical for enhancing our understanding of IE pathogenesis, as well as for alerting physicians and microbiologists to enable rapid identification and treatment to avoid unfavorable outcomes.

RevDate: 2021-05-29
CmpDate: 2021-05-27

Bertazzoni S, Jones DAB, Phan HT, et al (2021)

Chromosome-level genome assembly and manually-curated proteome of model necrotroph Parastagonospora nodorum Sn15 reveals a genome-wide trove of candidate effector homologs, and redundancy of virulence-related functions within an accessory chromosome.

BMC genomics, 22(1):382.

BACKGROUND: The fungus Parastagonospora nodorum causes septoria nodorum blotch (SNB) of wheat (Triticum aestivum) and is a model species for necrotrophic plant pathogens. The genome assembly of reference isolate Sn15 was first reported in 2007. P. nodorum infection is promoted by its production of proteinaceous necrotrophic effectors, three of which are characterised - ToxA, Tox1 and Tox3.

RESULTS: A chromosome-scale genome assembly of P. nodorum Australian reference isolate Sn15, which combined long read sequencing, optical mapping and manual curation, produced 23 chromosomes with 21 chromosomes possessing both telomeres. New transcriptome data were combined with fungal-specific gene prediction techniques and manual curation to produce a high-quality predicted gene annotation dataset, which comprises 13,869 high confidence genes, and an additional 2534 lower confidence genes retained to assist pathogenicity effector discovery. Comparison to a panel of 31 internationally-sourced isolates identified multiple hotspots within the Sn15 genome for mutation or presence-absence variation, which was used to enhance subsequent effector prediction. Effector prediction resulted in 257 candidates, of which 98 higher-ranked candidates were selected for in-depth analysis and revealed a wealth of functions related to pathogenicity. Additionally, 11 out of the 98 candidates also exhibited orthology conservation patterns that suggested lateral gene transfer with other cereal-pathogenic fungal species. Analysis of the pan-genome indicated the smallest chromosome of 0.4 Mbp length to be an accessory chromosome (AC23). AC23 was notably absent from an avirulent isolate and is predominated by mutation hotspots with an increase in non-synonymous mutations relative to other chromosomes. Surprisingly, AC23 was deficient in effector candidates, but contained several predicted genes with redundant pathogenicity-related functions.

CONCLUSIONS: We present an updated series of genomic resources for P. nodorum Sn15 - an important reference isolate and model necrotroph - with a comprehensive survey of its predicted pathogenicity content.

RevDate: 2021-05-26

Maguvu TE, Oladipo AO, CC Bezuidenhout (2021)

Analysis of Genome Sequences of Coagulase-Negative Staphylococci Isolates from South Africa and Nigeria Highlighted Environmentally Driven Heterogeneity.

Journal of genomics, 9:26-37.

Here, we report high-quality annotated draft genomes of eight coagulase-negative staphylococci (CoNS) isolates obtained from South Africa and Nigeria. We explored the prevalence of antibiotic resistance and virulence genes, their association with mobile genetic elements. The pan-genomic analysis highlighted the environmentally driven heterogeneity of the isolates. Isolates from Nigeria had at least one gene for cadmium resistance/tolerance, these genes were not detected in isolates from South Africa. In contrast, isolates from South Africa had a tetM gene, which was not detected among the isolates from Nigeria. The observed genomic heterogeneity correlates with anthropogenic activities in the area where the isolates were collected. Moreover, the isolates used in this study possess an open pan-genome, which could easily explain the environmentally driven heterogeneity.

RevDate: 2021-05-26

Fu X, Gong L, Liu Y, et al (2021)

Bacillus pumilus Group Comparative Genomics: Toward Pangenome Features, Diversity, and Marine Environmental Adaptation.

Frontiers in microbiology, 12:571212.

Background: Members of the Bacillus pumilus group (abbreviated as the Bp group) are quite diverse and ubiquitous in marine environments, but little is known about correlation with their terrestrial counterparts. In this study, 16 marine strains that we had isolated before were sequenced and comparative genome analyses were performed with a total of 52 Bp group strains. The analyses included 20 marine isolates (which included the 16 new strains) and 32 terrestrial isolates, and their evolutionary relationships, differentiation, and environmental adaptation.

Results: Phylogenomic analysis revealed that the marine Bp group strains were grouped into three species: B. pumilus, B. altitudinis and B. safensis. All the three share a common ancestor. However, members of B. altitudinis were observed to cluster independently, separating from the other two, thus diverging from the others. Consistent with the universal nature of genes involved in the functioning of the translational machinery, the genes related to translation were enriched in the core genome. Functional genomic analyses revealed that the marine-derived and the terrestrial strains showed differences in certain hypothetical proteins, transcriptional regulators, K+ transporter (TrK) and ABC transporters. However, species differences showed the precedence of environmental adaptation discrepancies. In each species, land specific genes were found with possible functions that likely facilitate survival in diverse terrestrial niches, while marine bacteria were enriched with genes of unknown functions and those related to transcription, phage defense, DNA recombination and repair.

Conclusion: Our results indicated that the Bp isolates show distinct genomic features even as they share a common core. The marine and land isolates did not evolve independently; the transition between marine and non-marine habitats might have occurred multiple times. The lineage exhibited a priority effect over the niche in driving their dispersal. Certain intra-species niche specific genes could be related to a strains adaptation to its respective marine or terrestrial environment(s). In summary, this report describes the systematic evolution of 52 Bp group strains and will facilitate future studies toward understanding their ecological role and adaptation to marine and/or terrestrial environments.

RevDate: 2021-07-11

Nasim F, Dey A, IA Qureshi (2021)

Comparative genome analysis of Corynebacterium species: The underestimated pathogens with high virulence potential.

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases, 93:104928 pii:S1567-1348(21)00225-2 [Epub ahead of print].

Non-diphtherial Corynebacterium species or diphtheroids were previously considered as the mere contaminants of clinical samples. Of late, they have been reckoned as the formidable infection causing agents of various diseases. While the scientific database is filled with articles that document whole genome analysis of individual isolates, a comprehensive comparative genomic analysis of diphtheroids alongside Corynebacterium diphtheriae is expected to enable us in understanding their genomic as well as evolutionary divergence. Here, we have analysed the whole genome sequences of forty strains that were selected from a range of eleven Corynebacterium species (pathogenic and non-pathogenic). A statistical analysis of the pan and core genomes revealed that even though the core genome is saturated, the pan genome is yet open rendering scope for newer gene families to be accumulated in the course of evolution that might further change the pathogenic behavior of these species. Every strain had bacteriophage components integrated in its genome and some of them were intact and consisted of toxins. The presence of diversified genomic islands was observed across the dataset and most of them consisted of genes for virulence and multidrug resistance. Moreover, the phylogenetic analysis showed that a diphtheroid is the last common ancestor of all the Corynebacterium species. The current study is a compilation of genomic features of pathogenic as well as non-pathogenic Corynebacterium species which provides insights into their virulence potential in the times to come.

RevDate: 2021-06-17

Tao Y, Luo H, Xu J, et al (2021)

Extensive variation within the pan-genome of cultivated and wild sorghum.

Nature plants, 7(6):766-773.

Sorghum is a drought-tolerant staple crop for half a billion people in Africa and Asia, an important source of animal feed throughout the world and a biofuel feedstock of growing importance. Cultivated sorghum and its inter-fertile wild relatives constitute the primary gene pool for sorghum. Understanding and characterizing the diversity within this valuable resource is fundamental for its effective utilization in crop improvement. Here, we report analysis of a sorghum pan-genome to explore genetic diversity within the sorghum primary gene pool. We assembled 13 genomes representing cultivated sorghum and its wild relatives, and integrated them with 3 other published genomes to generate a pan-genome of 44,079 gene families with 222.6 Mb of new sequence identified. The pan-genome displays substantial gene-content variation, with 64% of gene families showing presence/absence variation among genomes. Comparisons between core genes and dispensable genes suggest that dispensable genes are important for sorghum adaptation. Extensive genetic variation was uncovered within the pan-genome, and the distribution of these variations was influenced by variation of recombination rate and transposable element content across the genome. We identified presence/absence variants that were under selection during sorghum domestication and improvement, and demonstrated that such variation had important phenotypic outcomes that could contribute to crop improvement. The constructed sorghum pan-genome represents an important resource for sorghum improvement and gene discovery.

RevDate: 2021-06-11

Drott MT, Rush TA, Satterlee TR, et al (2021)

Microevolution in the pansecondary metabolome of Aspergillus flavus and its potential macroevolutionary implications for filamentous fungi.

Proceedings of the National Academy of Sciences of the United States of America, 118(21):.

Fungi produce a wealth of pharmacologically bioactive secondary metabolites (SMs) from biosynthetic gene clusters (BGCs). It is common practice for drug discovery efforts to treat species' secondary metabolomes as being well represented by a single or a small number of representative genomes. However, this approach misses the possibility that intraspecific population dynamics, such as adaptation to environmental conditions or local microbiomes, may harbor novel BGCs that contribute to the overall niche breadth of species. Using 94 isolates of Aspergillus flavus, a cosmopolitan model fungus, sampled from seven states in the United States, we dereplicate 7,821 BGCs into 92 unique BGCs. We find that more than 25% of pangenomic BGCs show population-specific patterns of presence/absence or protein divergence. Population-specific BGCs make up most of the accessory-genome BGCs, suggesting that different ecological forces that maintain accessory genomes may be partially mediated by population-specific differences in secondary metabolism. We use ultra-high-performance high-resolution mass spectrometry to confirm that these genetic differences in BGCs also result in chemotypic differences in SM production in different populations, which could mediate ecological interactions and be acted on by selection. Thus, our results suggest a paradigm shift that previously unrealized population-level reservoirs of SM diversity may be of significant evolutionary, ecological, and pharmacological importance. Last, we find that several population-specific BGCs from A. flavus are present in Aspergillus parasiticus and Aspergillus minisclerotigenes and discuss how the microevolutionary patterns we uncover inform macroevolutionary inferences and help to align fungal secondary metabolism with existing evolutionary theory.

RevDate: 2021-07-12

Feng Z, Liu X, Wang M, et al (2021)

A novel temperate phage, vB_PstS-pAN, induced from the naphthalene-degrading bacterium Pseudomonas stutzeri AN10.

Archives of virology, 166(8):2267-2272.

A novel temperate phage named vB_PstS-pAN was induced by mitomycin C treatment from the naphthalene-degrading bacterium Pseudomonas stutzeri AN10. The phage particles have icosahedral heads and long non-contractile tails, and vB_PstS-pAN can therefore be morphologically classified as a member of the family Siphoviridae. The whole genome of vB_PstS-pAN is 39,466 bp in length, with an 11-nt 3' overhang cohesive end. There are 53 genes in the vB_PstS-pAN genome, including genes responsible for phage integration, replication, morphogenesis, and bacterial lysis. The vB_PstS-pAN genome has low similarity to other phage genomes in the GenBank database, suggesting that vB_PstS-pAN is a novel member of the family Siphoviridae.

RevDate: 2021-05-31

Li Z, Song Q, Wang M, et al (2021)

Comparative genomics analysis of Pediococcus acidilactici species.

Journal of microbiology (Seoul, Korea), 59(6):573-583.

Pediococcus acidilactici is a reliable bacteriocin producer and a promising probiotic species with wide application in the food and health industry. However, the underlying genetic features of this species have not been analyzed. In this study, we performed a comprehensive comparative genomic analysis of 41 P. acidilactici strains from various ecological niches. The bacteriocin production of 41 strains were predicted and three kinds of bacteriocin encoding genes were identified in 11 P. acidilactici strains, namely pediocin PA-1, enterolysin A, and colicin-B. Moreover, whole-genome analysis showed a high genetic diversity within the population, mainly related to a large proportion of variable genomes, mobile elements, and hypothetical genes obtained through horizontal gene transfer. In addition, comparative genomics also facilitated the genetic explanation of the adaptation for host environment, which specify the protection mechanism against the invasion of foreign DNA (i.e. CRISPR/Cas locus), as well as carbohydrate fermentation. The 41 strains of P. acidilactici can metabolize a variety of carbon sources, which enhances the adaptability of this species and survival in different environments. This study evaluated the antibacterial ability, genome evolution, and ecological flexibility of P. acidilactici from the perspective of genetics and provides strong supporting evidence for its industrial development and application.

RevDate: 2021-07-12

Dieckmann MA, Beyvers S, Nkouamedjo-Fankep RC, et al (2021)

EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure.

Nucleic acids research, 49(W1):W185-W192.

The EDGAR platform, a web server providing databases of precomputed orthology data for thousands of microbial genomes, is one of the most established tools in the field of comparative genomics and phylogenomics. Based on precomputed gene alignments, EDGAR allows quick identification of the differential gene content, i.e. the pan genome, the core genome, or singleton genes. Furthermore, EDGAR features a wide range of analyses and visualizations like Venn diagrams, synteny plots, phylogenetic trees, as well as Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI) matrices. During the last few years, the average number of genomes analyzed in an EDGAR project increased by two orders of magnitude. To handle this massive increase, a completely new technical backend infrastructure for the EDGAR platform was designed and launched as EDGAR3.0. For the calculation of new EDGAR3.0 projects, we are now using a scalable Kubernetes cluster running in a cloud environment. A new storage infrastructure was developed using a file-based high-performance storage backend which ensures timely data handling and efficient access. The new data backend guarantees a memory efficient calculation of orthologs, and parallelization has led to drastically reduced processing times. Based on the advanced technical infrastructure new analysis features could be implemented including POCP and FastANI genomes similarity indices, UpSet intersecting set visualization, and circular genome plots. Also the public database section of EDGAR was largely updated and now offers access to 24,317 genomes in 749 free-to-use projects. In summary, EDGAR 3.0 provides a new, scalable infrastructure for comprehensive microbial comparative gene content analysis. The web server is accessible at http://edgar3.computational.bio.

RevDate: 2021-05-14

Figueiredo G, Gomes M, Covas C, et al (2021)

The Unexplored Wealth of Microbial Secondary Metabolites: the Sphingobacteriaceae Case Study.

Microbial ecology [Epub ahead of print].

Research on secondary metabolites (SMs) has been mostly focused on Gram-positive bacteria, especially Actinobacteria. The association of genomics with robust bioinformatics tools revealed the neglected potential of Gram-negative bacteria as promising sources of new SMs. The family Sphingobacteriaceae belongs to the phylum Bacteroidetes having representatives in practically all environments including humans, rhizosphere, soils, wastewaters, among others. Some genera of this family have demonstrated great potential as plant growth promoters, bioremediators and producers of some value-added compounds such as carotenoids and antimicrobials. However, to date, Sphingobacteriaceae's SMs are still poorly characterized, and likewise, little is known about their chemistry. This study revealed that Sphingobacteriaceae pangenome encodes a total of 446 biosynthetic gene clusters (BGCs), which are distributed across 85 strains, highlighting the great potential of this bacterial family to produce SMs. Pedobacter, Mucilaginibacter and Sphingobacterium were the genera with the highest number of BGCs, especially those encoding the biosynthesis of ribosomally synthesized and post-translationally modified peptides (RiPPs), terpenes, polyketides and nonribosomal peptides (NRPs). In Mucilaginibacter and Sphingobacterium genera, M. lappiensis ATCC BAA-1855, Mucilaginibacter sp. OK098 (both with 11 BGCs) and Sphingobacterium sp. 21 (6 BGCs) are the strains with the highest number of BGCs. Most of the BGCs found in these two genera did not have significant hits with the MIBiG database. These results strongly suggest that the bioactivities and environmental functions of these compounds, especially RiPPs, PKs and NRPs, are still unknown. Among RiPPs, two genera encoded the production of class I and class III lanthipeptides. The last are associated with LanKC proteins bearing uncommon lyase domains, whose dehydration mechanism deserves further investigation. This study translated genomics into functional information that unveils the enormous potential of environmental Gram-negative bacteria to produce metabolites with unknown chemistries, bioactivities and, more importantly, unknown ecological roles.

RevDate: 2021-05-14

Maturana JL, JP Cárdenas (2021)

Insights on the Evolutionary Genomics of the Blautia Genus: Potential New Species and Genetic Content Among Lineages.

Frontiers in microbiology, 12:660920.

Blautia, a genus established in 2008, is a relevantly abundant taxonomic group present in the microbiome of human and other mammalian gastrointestinal (GI) tracts. Several described (or proposed) Blautia species are available at this date. However, despite the increasing level of knowledge about Blautia, its diversity is still poorly understood. The increasing availability of Blautia genomic sequences in the public databases opens the possibility to study this genus from a genomic perspective. Here we report the pangenome analysis and the phylogenomic study of 225 Blautia genomes available in RefSeq. We found 33 different potential species at the genomic level, 17 of them previously undescribed; we also confirmed by genomic standards the status of 4 previously proposed new Blautia species. Comparative genomic analyses suggest that the Blautia pangenome is open, with a relatively small core genome (∼ 700-800 gene families). Utilizing a set of representative genomes, we performed a gene family gain/loss model for the genus, showing that despite terminal nodes suffered more massive gene gain events than internal nodes (i.e., predicted ancestors), some ancestors were predicted to have gained an important number of gene families, some of them associated with the possible acquisition of metabolic abilities. Gene loss events remained lower than gain events in most cases. General aspects regarding pangenome composition and gene gain/loss events are discussed, as well as the proposition of changes in the taxonomic assignment of B. coccoides TY and the proposition of a new species, "B. pseudococcoides.".

RevDate: 2021-07-06
CmpDate: 2021-07-06

Almeida OGG, Furlan JPR, Stehling EG, et al (2021)

Comparative phylo-pangenomics reveals generalist lifestyles in representative Acinetobacter species and proposes candidate gene markers for species identification.

Gene, 791:145707.

Acinetobacter species have the potential to invade and colonize immunocompromised patients, therefore being well-known as opportunistic pathogens. Among these bacteria, the species of the Acinetobacter calcoaceticus-Acinetobacter baumannii "complex" (Acb members) emerge as the main often isolated bacteria in clinical specimens. The unequivocal taxonomy is crucial to correctly identify these species and associated with comparative genomic analyses aids to understand their life-styles as well. In this study, all publicly available Acinetobacter species at the date of this study preparation were analyzed. The results revealed that the Acb members are in fact a complex when phenotypic methods are confronted, while for comparative and phylogenomics analyses this term is misleading, since they composed a monophyletic group instead. Nine best gene markers (response regulator, recJ, recG, phosphomannomutase, pepSY, monovalent cation/H + antiporter subunit D, mnmE, glnE, and bamA) were selected for identification of Acinetobacter species. Moreover, representative strains of each species were split according their isolation sources in the categories: environmental, human, insect and non-human vertebrate. Neither niche-specific genome signature nor niche-associated functional and pathogenic potential were associated with their isolation source, meaning it is not the main force acting on Acinetobacter adaptation in a given niche and corroborating that their ubiquitous distribution is a reflex of their generalist life-styles.

RevDate: 2021-05-29

Crysnanto D, Leonard AS, Fang ZH, et al (2021)

Novel functional sequences uncovered through a bovine multiassembly graph.

Proceedings of the National Academy of Sciences of the United States of America, 118(20):.

Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six reference-quality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis-infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.

RevDate: 2021-06-05

Barchi L, Rabanus-Wallace MT, Prohens J, et al (2021)

Improved genome assembly and pan-genome provide key insights into eggplant domestication and breeding.

The Plant journal : for cell and molecular biology [Epub ahead of print].

Eggplant (Solanum melongena L.) is an important horticultural crop and one of the most widely grown vegetables from the Solanaceae family. It was domesticated from a wild, prickly progenitor carrying small, round, non-anthocyanic fruits. We obtained a novel, highly contiguous genome assembly of the eggplant '67/3' reference line, by Hi-C retrofitting of a previously released short read- and optical mapping-based assembly. The sizes of the 12 chromosomes and the fraction of anchored genes in the improved assembly were comparable to those of a chromosome-level assembly. We resequenced 23 accessions of S. melongena representative of the worldwide phenotypic, geographic, and genetic diversity of the species, and one each from the closely related species Solanum insanum and Solanum incanum. The eggplant pan-genome contained approximately 51.5 additional megabases and 816 additional genes compared with the reference genome, while the pan-plastome showed little genetic variation. We identified 53 selective sweeps related to fruit color, prickliness, and fruit shape in the nuclear genome, highlighting selection leading to the emergence of present-day S. melongena cultivars from its wild ancestors. Candidate genes underlying the selective sweeps included a MYBL1 repressor and CHALCONE ISOMERASE (for fruit color), homologs of Arabidopsis GLABRA1 and GLABROUS INFLORESCENCE STEMS2 (for prickliness), and orthologs of tomato FW2.2, OVATE, LOCULE NUMBER/WUSCHEL, SUPPRESSOR OF OVATE, and CELL SIZE REGULATOR (for fruit size/shape), further suggesting that selection for the latter trait relied on a common set of orthologous genes in tomato and eggplant.

RevDate: 2021-05-08

Whelan FJ, Hall RJ, JO McInerney (2021)

Evidence for selection in the abundant accessory gene content of a prokaryote pangenome.

Molecular biology and evolution pii:6272232 [Epub ahead of print].

A pangenome is the complete set of genes (core and accessory) present in a phylogenetic clade. We hypothesize that a pangenome's accessory gene content is structured and maintained by selection. To test this hypothesis, we interrogated the genomes of 40 Pseudomonas species for statistically significant coincident (i.e. co-occurring/avoiding) gene patterns. We found that 86.7% of common accessory genes are involved in ≥1 coincident relationship. Further, genes that co-occur and/or avoid each other - but are not vertically inherited - are more likely to share functional categories, are more likely to be simultaneously transcribed, and are more likely to produce interacting proteins, than would be expected by chance. These results are not due to coincident genes being adjacent to one another on the chromosome. Together, these findings suggest that the accessory genome is structured into sets of genes that function together within a given strain. Given the similarity of the Pseudomonas pangenome with open pangenomes of other prokaryotic species, we speculate that these results are generalizable. Keyword: pangenome, microbial genomics, selection.

RevDate: 2021-07-07

Buzzanca D, Botta C, Ferrocino I, et al (2021)

Functional pangenome analysis reveals high virulence plasticity of Aliarcobacter butzleri and affinity to human mucus.

Genomics, 113(4):2065-2076.

Aliarcobacter butzleri is an emerging pathogen that may cause enteritis in humans, however, the incidence of disease caused by this member of the Campylobacteriaceae family is still underestimated. Furthermore, little is known about the precise virulence mechanism and behavior during infection. Therefore, in the present study, through complementary use of comparative genomics and physiological tests on human gut models, we sought to elucidate the genetic background of a set of 32 A. butzleri strains of diverse origin and to explore the correlation with the ability to colonize and invade human intestinal cells in vitro. The simulated infection of human intestinal models showed a higher colonization rate in presence of mucus-producing cells. For some strains, human mucus significantly improved the resistance to physical removal from the in vitro mucosa, while short time-frame growth was even observed. Pangenome analysis highlighted a hypervariable accessory genome, not strictly correlated to the isolation source. Likewise, the strain phylogeny was unrelated to their shared origin, despite a certain degree of segregation was observed among strains isolated from different segments of the intestinal tract of pigs. The putative virulence genes detected in all strains were mostly encompassed in the accessory fraction of the pangenome. The LPS biosynthesis and in particular the chain glycosylation of the O-antigen is harbored in a region of high plasticity of the pangenome, which would indicate frequent horizontal gene transfer phenomena, as well as the involvement of this hypervariable structure in the adaptive behavior and sympatric evolution of A. butzleri. Results of the present study deepen the current knowledge on A. butzleri pangenome by extending the pool of genes regarded as virulence markers and provide bases to develop new diagnostic approaches for the detection of those strains with a higher virulence potential.

RevDate: 2021-06-23
CmpDate: 2021-06-23

Vanni C (2021)

Accurate Annotation of Microbial Metagenomic Genes and Identification of Core Sets.

Methods in molecular biology (Clifton, N.J.), 2242:115-138.

In the past decade, metagenomics studies of microbial communities have added billions of sequences to the databases. This extensive amount of data and information has the potential to widen our understanding of the functioning of microbial communities and their roles in the environment. A fundamental step in this process is the functional and taxonomic profiling of the metagenomes, through an accurate gene annotation. This gene-level information can then be placed in the genomic context of metagenome-assembled genomes. Then, on a broader level, we can place this combined data into the context of a pangenome and start characterizing core and accessory gene sets. In this chapter, we provide a workflow to create an annotated gene catalog and to identify core sets of genes in the context of a pangenome. The first section will focus on the methods to provide metagenomic genes with accurate annotations. The second part will describe how to combine the gene catalog information with metagenome-assembled genomes and how to use both to build and investigate a pangenome.

RevDate: 2021-06-23
CmpDate: 2021-06-23

Zoledowska S, Motyka-Pomagruk A, Misztak A, et al (2021)

Comparative Genomics, from the Annotated Genome to Valuable Biological Information: A Case Study.

Methods in molecular biology (Clifton, N.J.), 2242:91-112.

High availability of fast, cheap, and high-throughput next generation sequencing techniques resulted in acquisition of numerous de novo sequenced and assembled bacterial genomes. It rapidly became clear that digging out useful biological information from such a huge amount of data presents a considerable challenge. In this chapter we share our experience with utilization of several handy open source comparative genomic tools. All of them were applied in the studies focused on revealing inter- and intraspecies variation in pectinolytic plant pathogenic bacteria classified to Dickeya solani and Pectobacterium parmentieri. As the described software performed well on the species within the Pectobacteriaceae family, it presumably may be readily utilized on some closely related taxa from the Enterobacteriaceae family. First of all, implementation of various annotation software is discussed and compared. Then, tools computing whole genome comparisons including generation of circular juxtapositions of multiple sequences, revealing the order of synteny blocks or calculation of ANI or Tetra values are presented. Besides, web servers intended either for functional annotation of the genes of interest or for detection of genomic islands, plasmids, prophages, CRISPR/Cas are described. Last but not least, utilization of the software designed for pangenome studies and the further downstream analyses is explained. The presented work not only summarizes broad possibilities assured by the comparative genomic approach but also provides a user-friendly guide that might be easily followed by nonbioinformaticians interested in undertaking similar studies.

RevDate: 2021-06-24
CmpDate: 2021-06-24

Nguyen TL, HH Pham Thi (2021)

Genome-wide comparison of coronaviruses derived from veterinary animals: A canine and feline perspective.

Comparative immunology, microbiology and infectious diseases, 76:101654.

Feline- and canine-derived coronaviruses (FCoVs and CCoVs) are widespread among dog and cat populations. This study was to understand the route of disease origin and viral transmission in veterinary animals and in human through comparative pan-genomic analysis of coronavirus sequences, especially retrieved from genomes of FCoV and CCoV. Average nucleotide identity based on complete genomes might clustered CoV strains according to their infected host, with an exception of type II of CCoV (accession number KC175339) that was clustered closely to virulent FCoVs. In contrast, the hierarchical clustering based on gene repertories retrieved from pan-genome analysis might divided the examined coronaviruses into host-independent clusters, and formed obviously the cluster of Alphacoronaviruses into sub-clusters of feline-canine, only feline, feline-canine-human coronavirus. Also, functional analysis of genomic subsets might help to divide FCoV and CCoV pan-genomes into (i) clusters of core genes encoding spike, membrane, nucleocapsid proteins, and ORF1ab polyprotein; (ii) clusters of core-like genes encoding nonstructural proteins; (iii) clusters of accessory genes encoding the ORF1A; and (iv) two singleton genes encoding nonstructural protein and polyprotein 1ab. Seven clusters of gene repertories were categorized as common to the FCoV and/or CCoV genomes including pantropic and high virulent strains, illustrating that distinct core-like genes/accessory genes concerning to their pathogenicity should be exploited in further biotype analysis of new isolate. In conclusion, the phylogenomic analyses have allowed the identification of trends in the viral genomic data, especially in developing a specific control measures against coronavirus disease, such as the selection of good markers for differentiating new species from common and/or pantropic isolates.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Prondzinsky P, Berkemer SJ, Ward LM, et al (2021)

The Thermosynechococcus Genus: Wide Environmental Distribution, but a Highly Conserved Genomic Core.

Microbes and environments, 36(2):.

Cyanobacteria thrive in diverse environments. However, questions remain about possible growth limitations in ancient environmental conditions. As a single genus, the Thermosynechococcus are cosmopolitan and live in chemically diverse habitats. To understand the genetic basis for this, we compared the protein coding component of Thermosynechococcus genomes. Supplementing the known genetic diversity of Thermosynechococcus, we report draft metagenome-assembled genomes of two Thermosynechococcus recovered from ferrous carbonate hot springs in Japan. We find that as a genus, Thermosynechococcus is genomically conserved, having a small pan-genome with few accessory genes per individual strain as well as few genes that are unique to the genus. Furthermore, by comparing orthologous protein groups, including an analysis of genes encoding proteins with an iron related function (uptake, storage or utilization), no clear differences in genetic content, or adaptive mechanisms could be detected between genus members, despite the range of environments they inhabit. Overall, our results highlight a seemingly innate ability for Thermosynechococcus to inhabit diverse habitats without having undergone substantial genomic adaptation to accommodate this. The finding of Thermosynechococcus in both hot and high iron environments without adaptation recognizable from the perspective of the proteome has implications for understanding the basis of thermophily within this clade, and also for understanding the possible genetic basis for high iron tolerance in cyanobacteria on early Earth. The conserved core genome may be indicative of an allopatric lifestyle-or reduced genetic complexity of hot spring habitats relative to other environments.

RevDate: 2021-06-16

Torkamaneh D, Lemay MA, F Belzile (2021)

The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content.

Plant biotechnology journal [Epub ahead of print].

Studies on structural variation in plants have revealed the inadequacy of a single reference genome for an entire species and suggest that it is necessary to build a species-representative genome called a pan-genome to better capture the extent of both structural and nucleotide variation. Here, we present a pan-genome of cultivated soybean (Glycine max), termed PanSoy, constructed using the de novo genome assembly of 204 phylogenetically and geographically representative improved accessions selected from the larger GmHapMap collection. PanSoy uncovers 108 Mb (~11%) of novel nonreference sequences encompassing 3621 protein-coding genes (including 1659 novel genes) absent from the soybean 'Williams 82' reference genome. Nonetheless, the core genome represents an exceptionally large proportion of the genome, with >90.6% of genes being shared by >99% of the accessions. A majority of PAVs encompassing genes could be confirmed with long-read sequencing on a subset of accessions. The PanSoy is a major step towards capturing the extent of genetic variation in cultivated soybean and provides a resource for soybean genomics research and breeding.

RevDate: 2021-06-28
CmpDate: 2021-06-28

Lawal OU, Fraqueza MJ, Worning P, et al (2021)

Staphylococcus saprophyticus Causing Infections in Humans Is Associated with High Resistance to Heavy Metals.

Antimicrobial agents and chemotherapy, 65(7):e0268520.

Staphylococcus saprophyticus is a common pathogen of the urinary tract, a heavy metal-rich environment, but information regarding its heavy metal resistance is unknown. We investigated 422 S. saprophyticus isolates from human infection and colonization/contamination, animals, and environmental sources for resistance to copper, zinc, arsenic, and cadmium using the agar dilution method. To identify the genes associated with metal resistance and assess possible links to pathogenicity, we accessed the whole-genome sequence of all isolates and used in silico and pangenome-wide association approaches. The MIC values for copper and zinc were uniformly high (1,600 mg/liter). Genes encoding copper efflux pumps (copA, copB, copZ, mco, and csoR) and zinc transporters (zinT, czrAB, znuBC, and zur) were abundant in the population (20 to 100%). Arsenic and cadmium showed various susceptibility levels. Genes encoding the ars operon (arsRDABC), an ABC transporter and a two-component permease, were linked to resistance to arsenic (MICs ≥ 1,600 mg/liter; 14% [58/422]; P < 0.05). At least three cad genes (cadA or cadC and cadD-cadX or czrC) and genes encoding multidrug efflux pumps and hyperosmoregulation in acidified conditions were associated with resistance to cadmium (MICs ≥ 200 mg/liter; 20% [85/422]; P < 0.05). These resistance genes were frequently carried by mobile genetic elements. Resistance to arsenic and cadmium were linked to human infection and a clonal lineage originating in animals (P < 0.05). Altogether, S. saprophyticus was highly resistant to heavy metals and accumulated multiple metal resistance determinants. The highest arsenic and cadmium resistance levels were associated with infection, suggesting resistance to these metals is relevant for S. saprophyticus pathogenicity.

RevDate: 2021-05-04

Mavrodi OV, McWilliams JR, Peter JO, et al (2021)

Root Exudates Alter the Expression of Diverse Metabolic, Transport, Regulatory, and Stress Response Genes in Rhizosphere Pseudomonas.

Frontiers in microbiology, 12:651282.

Plants live in association with microorganisms that positively influence plant development, vigor, and fitness in response to pathogens and abiotic stressors. The bulk of the plant microbiome is concentrated belowground at the plant root-soil interface. Plant roots secrete carbon-rich rhizodeposits containing primary and secondary low molecular weight metabolites, lysates, and mucilages. These exudates provide nutrients for soil microorganisms and modulate their affinity to host plants, but molecular details of this process are largely unresolved. We addressed this gap by focusing on the molecular dialog between eight well-characterized beneficial strains of the Pseudomonas fluorescens group and Brachypodium distachyon, a model for economically important food, feed, forage, and biomass crops of the grass family. We collected and analyzed root exudates of B. distachyon and demonstrated the presence of multiple carbohydrates, amino acids, organic acids, and phenolic compounds. The subsequent screening of bacteria by Biolog Phenotype MicroArrays revealed that many of these metabolites provide carbon and energy for the Pseudomonas strains. RNA-seq profiling of bacterial cultures amended with root exudates revealed changes in the expression of genes encoding numerous catabolic and anabolic enzymes, transporters, transcriptional regulators, stress response, and conserved hypothetical proteins. Almost half of the differentially expressed genes mapped to the variable part of the strains' pangenome, reflecting the importance of the variable gene content in the adaptation of P. fluorescens to the rhizosphere lifestyle. Our results collectively reveal the diversity of cellular pathways and physiological responses underlying the establishment of mutualistic interactions between these beneficial rhizobacteria and their plant hosts.

RevDate: 2021-05-19

Miga KH, T Wang (2021)

The Need for a Human Pangenome Reference Sequence.

Annual review of genomics and human genetics [Epub ahead of print].

The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 22 is August 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

RevDate: 2021-05-02

Puri A, Bajaj A, Lal S, et al (2021)

Phylogenomic Framework for Taxonomic Delineation of Paracoccus spp. and Exploration of Core-Pan Genome.

Indian journal of microbiology, 61(2):180-194.

The taxonomic classification of metabolically versatile Paracoccus spp. has been so far performed using polyphasic approach. The topology of single gene phylogenies, however, has highlighted ambiguous species assignments. In the present study, genome based multi-gene phylogenies and overall genome related index were used for species threshold assessment. Comprehensive phylogenomic analysis of Paracoccus genomes (n = 103) showed concordant clustering of strains across multi-gene marker set phylogenies (nMC = 0.08-0.14); as compared to 16S rDNA phylogeny (nMC = 0.37-0.42) suggesting robustness of multi gene phylogenies in drawing phylogenetic inferences. Functional gene content distribution across the genus showed that only 1.7% gene content constitutes the core genome highlighting the significance of extensive genomic variability in the evolution of Paracoccus spp. Further, genome metrics were used to validate characterized strains, identifying classification anomalies (n = 13), and based on this, genome derived taxonomic amendments were notified in present study. Conclusively, validated metric tools can be employed on whole genome sequences, including draft assemblies, for the assessment and assignment of uncharacterized strains and species level ascription of newly isolated Paracoccus strains in future.

RevDate: 2021-06-03
CmpDate: 2021-06-03

Grassi F, G De Lorenzis (2021)

Back to the Origins: Background and Perspectives of Grapevine Domestication.

International journal of molecular sciences, 22(9):.

Domestication is a process of selection driven by humans, transforming wild progenitors into domesticated crops. The grapevine (Vitis vinifera L.), besides being one of the most extensively cultivated fruit trees in the world, is also a fascinating subject for evolutionary studies. The domestication process started in the Near East and the varieties obtained were successively spread and cultivated in different areas. Whether the domestication occurred only once, or whether successive domestication events occurred independently, is a highly debated mystery. Moreover, introgression events, breeding and intense trade in the Mediterranean basin have followed, in the last thousands of years, obfuscating the genetic relationships. Although a succession of studies has been carried out to explore grapevine origin and different evolution models are proposed, an overview of the topic remains pending. We review here the findings obtained in the main phylogenetic and genomic studies proposed in the last two decades, to clarify the fundamental questions regarding where, when and how many times grapevine domestication took place. Finally, we argue that the realization of the pan-genome of grapes could be a useful resource to discover and track the changes which have occurred in the genomes and to improve our understanding about the domestication.

RevDate: 2021-05-27

Parker CT, Huynh S, Alexander A, et al (2021)

Genomic Characterization of Salmonella typhimurium DT104 Strains Associated with Cattle and Beef Products.

Pathogens (Basel, Switzerland), 10(5):.

Salmonella enterica subsp. enterica serovar Typhimurium DT104, a multidrug-resistant phage type, has emerged globally as a major cause of foodborne outbreaks particularly associated with contaminated beef products. In this study, we sequenced three S. Typhimurium DT104 strains associated with a 2009 outbreak caused by ground beef, including the outbreak source strain and two clinical strains. The goal of the study was to gain a stronger understanding of the genomics and genomic epidemiology of highly clonal S. typhimurium DT104 strains associated with bovine sources. Our study found no single nucleotide polymorphisms (SNPs) between the ground beef source strain and the clinical isolates from the 2009 outbreak. SNP analysis including twelve other S. typhimurium strains from bovine and clinical sources, including both DT104 and non-DT104, determined DT104 strains averaged 55.0 SNPs between strains compared to 474.5 SNPs among non-DT104 strains. Phylogenetic analysis separated the DT104 strains from the non-DT104 strains, but strains did not cluster together based on source of isolation even within the DT104 phage type. Pangenome analysis of the strains confirmed previous studies showing that DT104 strains are missing the genes for the allantoin utilization pathway, but this study confirmed that the genes were part of a deletion event and not substituted or disrupted by the insertion of another genomic element. Additionally, cgMLST analysis revealed that DT104 strains with cattle as the source of isolation were quite diverse as a group and did not cluster together, even among strains from the same country. Expansion of the analysis to 775 S. typhimurium ST19 strains associated with cattle from North America revealed diversity between strains, not limited to just among DT104 strains, which suggests that the cattle environment is favorable for a diverse group of S. typhimurium strains and not just DT104 strains.


