Background & Summary

Seed microbiomes are essential to plant health, growth, and resilience, and play an important role in the physiological processes required for effective crop development1. The barley seed microbiome, in particular, is of critical importance, influencing not only crop yield but also the quality of barley-derived products2,3. Barley (Hordeum vulgare) has been integral to agriculture since the early phases of human civilization4. Its significance in the modern era is two-fold: as a fundamental component of the global food system, and as a crucial ingredient in the brewing industry3,5. While the physiological attributes of barley influence malt quality, the microbial communities associated with barley also play an essential role, from sowing to malting2.

Malting barley seeds are colonised by rich and diverse microbial communities, encompassing both endophytic and epiphytic organisms1,6,7. These microorganisms, which can be both beneficial and detrimental, have the potential to affect seed health, germination success, and the quality of fermentation products8,9,10. Several studies highlight the diversity of microbial populations associated with malting barley and their potential effects on brewing product quality8,11,12. Understanding these microbial communities and their genomic content can provide insights into seed storage longevity, contamination risks, and their potential impact on subsequent production stages. However, there is a notable gap in comprehensive metagenomic datasets focusing on these microbial communities, especially during the seed storage phase.

Metagenome sequencing can provide profound insights into microbial ecosystems without necessitating laboratory cultivation13,14,15. This approach not only provides a comprehensive understanding of the taxonomic and functional variations among phytomicrobial communities, but also sheds light on the complex interrelationships across these communities and their plant hosts16,17. In the context of barley seed storage, acquiring this understanding using omics paves the way for developing microbial management strategies, optimising storage conditions, mitigating losses, and ensuring consistent production of premium malt.

Whole metagenomes were sequenced from eight samples of barley seeds stored in siloes at four different time points (two samples per time point), namely at harvest and after three, six and nine months, respectively (Table S1). The metagenomic data was assembled into nearly complete microbial genomes. A total of 82 metagenome-assembled genomes (MAGs) were assembled from these metagenomes (Table S2). The completeness of the MAGs was evaluated using CheckM v1.2.218. All MAGs demonstrated completeness >75%, with 50/82 being >90% complete. These completeness values are in alignment with the high-quality draft criterion of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards for Bacteria and Archaea19 (Fig. 1, Table S2).

Fig. 1
figure 1

Comparative analysis of phylum distribution, MAGs completeness, and contamination.

Furthermore, minimal levels of sequence heterogeneity were observed for all 82 MAGs. Approximately 91% (75/82) of the MAGs registered contamination levels <5%, whereas the remaining seven MAGS exhibited contaminant levels between 5 and 10%, ensuring the reliability and integrity of our dataset (Fig. 1 and Table S2). We identified a notable negative correlation between genome completeness and contamination (r = −0.498, p < 0.00001; Fig. 2A). In parallel, our data demonstrated a positive relationship between genome size and the N50 metric (r = 0.251, p = 0.023; Fig. 2B), indicating that larger genomes are often associated with superior assembly contiguity.

Fig. 2
figure 2

Correlations in Metagenome-Assembled Genomes (MAGs).

Taxonomic evaluation using the Genome Taxonomy Database Toolkit (GTDB-Tk)20 revealed that the barley-associated MAG dataset was dominated by members of the phylum Pseudomonadota (formerly the Proteobacteria), comprising 53.7% (44/82) of the total MAGs (Table S2) This is consistent with the findings from a previous amplicon sequencing-based study of barley seed endophytic microbial communities7. However, in contrast to the previous findings, we identified Bacteroidota (16/82) as the second most prevalent phylum. The abundances of Actinobacteria and Bacillota (Firmicutes) in our study also differed from those previously reported7, underscoring the inherent variability of barley seed microbiomes (Fig. 1 and Table S2).

Temporal shifts in genera abundance over nine months

The barley-seed derived MAGs were classified into 26 bacterial genera across eight phyla and six classes (Table S2). The microbiome was characterised by several dominant genera, with thirteen, nine, seven and six MAGs belonging to the genera Erwinia, Pseudomonas, Chryseobacterium and Paenibacillus, respectively (Fig. 3). Notably, 16 MAGs could not be accurately classified at the species level, highlighting the underexplored microbial diversity associated with barley seeds (Fig. 4, Table S2).

Fig. 3
figure 3

Genomic Metrics of the identified Bacterial Genera.

Fig. 4
figure 4

Phylogenetic Relationships of Bacterial MAGs.

The barley seed microbiome shows discernible shifts during storage (Fig. 5). While the genera Erwinia and Duffyella remain pertinent from harvest through prolonged storage, there is a notable downshift and upshift in the presence of genera Chryseobacterium and Pseudomonas_E, respectively, during silo storage. These shifts may provide insights into the role of the barley seed microbiome in both seed health and disease. Chryseobacterium sp. have been observed to counteract the effects of Magnaporthe oryzae, a cause of barley blast disease, primarily by detaching fungal spores from leaf surfaces21, and may contribute to maintaining seed health in the field. Duffyella also garnered interest due to its observed ability to curb the growth of Fusarium tricinctum, another pathogen affecting barley22,23. All Erwinia MAGs identified in the study were classified in the species E. persicina, a known broad host range phytopathogen, which has been linked to pink seed disease in barley24. Pseudomonas-like taxa in this study were classified as part of the novel genus Pseudomonas_E as predicted by the GTDB classification database20.

Fig. 5
figure 5

Combined plots illustrating the top 10 genera.

Methods

Sample collection and processing

Malting barley (Hordeum vulgare) samples, of a single cultivar (Kadie), were sourced from Anheuser-Busch InBev (AB-Inbev) in South Africa., specifically from Storage facilities in the Western Cape province, South Africa, were selected. Samples were collected at four distinct time points: immediately post-harvest and then after three, six, and nine months of storage in silos. At each time point, three samples were collected. All samples were aseptically collected and stored at −20 °C to inhibit microbial growth.

DNA isolation and sequencing

Approximately 10 g of barley was crushed using a sterilised mortar and pestle. The resulting residue was suspended in 40 ml of phosphate buffered saline (PBS) solution (pH 7.4). The suspension was briefly vortexed to homogenise the mixture, followed by sonication at 18 W amplitude with a 30-s on-off pulsating schedule for 7 min. The mixture was centrifuged at 4000 × g for 1 min to separate the supernatant, which was transferred to an autoclaved polycarbonate filter holder and filter membrane (0.45 µm pore filter, Sartorius-Stedim Biotech) prepared filter membrane system.

Metagenomic DNA was extracted from the filter using the ZymoBIOMICS DNA/RNA Miniprep Kit (Zymo Research), following the protocol recommended by the manufacturer. A Nanodrop Lite Spectrophotometer (Thermo Fisher Scientific) was used to validate the integrity and purity and quantify the DNA. The metagenomic DNA samples were sequenced using the Illumina NovaSeq. 6000 platform (paired end reads, 2 × 250 bp) at Molecular Research (MRDNA, Texas, USA). The total number of reads obtained was approximately 365.27 million. On average, each sample yielded around 22.83 million reads, with the maximum number of reads for a single sample being approximately 38.26 million and the minimum around 10.36 million. These metrics provide an overview of the sequencing depth achieved in our study. For a detailed breakdown of read counts for each sample (Table S1).

Metagenomic data analysis

Raw sequence reads were evaluated for quality using FastQC v0.12.125 and MultiQC v1.1526. Trimmomatic V0.3627 was used to filter out reads shorter than 36 bp or with an average quality score lower than 15. The removal of host DNA was performed using Bowtie2 v2.5.128 and SAMtools v1.1929. Initially, an index database employing the reference genome of barley (Hordeum vulgare, Accession number: GCF_904849725.1) was constructed using the bowtie2-build command. Subsequently, read mapping to the host sequence database with Bowtie2 was conducted, preserving both aligned and unaligned paired end reads. Following this, SAMtools was used to convert the sam file into a bam format. The required unmapped reads were precisely isolated by applying SAMtools SAM-flag filters (-f 12 and -F 256), which selected pairs where both reads (R1 and R2) were unmapped. Finally, the SAMtools sort and SAMtools fastq commands were used to separate the paired end reads into distinct fastq files. Host DNA contamination varied across samples with the mean contamination ratio was approximately 0.5757%, with the minimum at 0.0059% (3,088 contaminated reads out of 52,678,404) and the maximum at 2.7368% (567,134 contaminated reads out of 20,155,530) (Table S1). Thereafter, the reads were then assembled using metaSPAdes v3.15.330 with default parameters. The integrity and quality of the final assemblies were evaluated using QUAST v5.2.031.

Metagenomic binning and refinement

Metagenomic binning was performed based on tetranucleotide frequencies, coverage, and GC content using the MetaWRAP v1.332 pipeline with default parameters using the tools MaxBin v2.033, metaBAT234, and CONCOCT v1.0.035. The bins were refined further using the MetaWRAP-Bin_refinement module with the parameters -c 70 and -× 10 (completeness >70% and contamination <10%) to improve bin quality. The completeness and contamination levels of these genome segments were evaluated using CheckM v1.2.218 as part of the MetaWRAP workflow. Subsequently, the bins were reassembled using the MetaWRAP-reassemble_bins module (parameters: -c 70 × 10). The refined bins were dereplicated at a 95% average nucleotide identity (ANI) threshold using dRep v2.6.236, culminating in 82 nonredundant MAGs.

Phylogenetic analysis and classification of MAGS

For taxonomic assignment of MAGs, the classify_wf workflow from GTDB-Tk v3.4.220 was employed in tandem with the reference data GTDB release207v220, all executed with default settings. A comprehensive phylogenetic tree encompassing 82 species-level bacterial MAGs was derived from 120 bacterial marker genes using the gtdbtk_infer module in GTDB-TK. To improve interpretation and visualisation, the tree was annotated using iTOL v537.

Data Records

The data records are available Figshare38.

The 82 MAGs have been deposited at DDBJ/ENA/GenBank under the accession numbers listed in Table 139,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119.

Table 1 Genomic characteristics and accession numbers of 82 microbial genomes from barley seed communities described in this study.

Additional metadata and details about each MAGs are available in the Supplementary Table S2.

The raw reads used to reconstruct the MAGs have been deposited to the NCBI Sequence Read Archive120.

Technical Validation

Implementation of robust software applications, such as FastQC, MultiQC, and Trimmomatic, all of which were designed to curate and refine the sequence data. Combining the comprehensive MetaWRAP pipeline with dependable tools such as CheckM and GTDB-tk strengthened the binning, genome assembly, and taxonomic assignment processes. The culmination of these exhaustive validation stages is a dataset that is not only technically sound, but also a model of dependability and reproducibility in metagenomic research.