Time-series sewage metagenomics distinguishes seasonal, human-derived and environmental microbial communities potentially allowing source-attributed surveillance

Abundance and diversity of recovered genomesMetagenomic assemblies of all the individual samples resulted in 42,731,574 contigs with an average length of 2179 bp (N50: 2303) and are in the following referred to as the single-sample assembly. Metagenomic co-assemblies of time points from the same site resulted in 25,120,037 contigs with an average length of 2333 bp (N50: 2569).A total of 23,082 bins were recovered from kmer/depth-binning the individual metagenomic assemblies and 11,643 kmer/depth-co-abundance binning of the metagenomic assemblies. In total, 2523 bins were found with completeness >90% and contamination below <5% and 12,687 bins with completeness >=50% and contamination <=10%, according to CheckM220.We aligned ~69% of the sequencing reads to the collection of species-level genome collection. When exploring the genomes’ abundances, we observed that in some samples, the genus Pseudomonas_E (GTDB-tk splits the wide Pseudomonas into new genus-sized genera) was highly abundant. This is primarily attributed to 12 different known Pseudomonas_E species (relative abundance >0.05), including Pseudomonas_E bubulae, Pseudomonas_E lundensis and Pseudomonas_E paraversuta. None of our recovered Pseudomonas genomes belonged to the well-known opportunistic pathogen, Pseudomonas aeruginosa.Alpha diversity (Shannon index) remained relatively consistent across Copenhagen and Rotterdam. However, notable alpha diversity drops were seen in sites with occasional Pseudomonas_E dominance (Fig. 1a). See Supplementary Fig. S1 for a more detailed timeline of Pseudomonas_E blooms.Fig. 1: Abundance and diversity of recovered genomes through time and space.a Shannon alpha diversity index of each sample. Circled dots indicate high levels of Pseudomonas_E. b Stacked bar plots of the relative abundance of genera, stratified by sampling site and left-to-right sorted according to sampling date. The ‘merged’ category includes all remaining genera, for which no genus exceeded 5% of any sample. c Multidimensional scaling of sample beta-diversity, based on Bray–Curtis (BC) dissimilarity. Point size is proportional to the inverse Shannon index. d Also, multidimensional scaling, but using the Aitchison distance instead of BC. Centroids are marked with a diamond and surrounded by an ellipse. These ellipses represent the 95% confidence intervals of the t-multivariate distribution for each sample group, serving as a graphical summary of the group’s variability.The Bologna sewage microbiome exhibited notable changes over time, with certain genera appearing and disappearing through time. In Budapest the sewage microbiome was characterized by a consistent set of bacterial genera, however their abundances exhibited sample-to-sample fluctuations. In the Copenhagen sewage treatment plants, variations over time were more pronounced. For instance, the genus Aliorcobacter emerged in the second year of sampling in the RD and RL plants but appeared to be absent in the RA plant. In the last few samples from Rotterdam, the genera Psychrobacter and Giesbergeria became more abundant (Fig. 1b).In the beta-diversity analysis using Bray–Curtis dissimilarity, the first principal coordinate predominantly captured the extensive diversity across the samples. Notably, it highlighted the samples primarily composed from the genus Pseudomonas_E associated with low diversity (Fig. 1c).The application of the Aitchison distance instead (see methods) highlighted the unique microbial signatures associated with each European city (Fig. 1d). The three Copenhagen sites appear completely superimposed on each other, demonstrating low intra-city variability. To explore the variability of beta-diversity, we found that the site explained ~42% of the variance, while the city explained ~38% (P ≤ 0.001). This suggests that the specific treatment plant location has a minor impact on the observed beta-diversity, with the broader geographic location accounting for a more substantial portion of the variation, consistent with Fig. 1d.Furthermore, there is a significant overlap observed between the cities of Bologna and Budapest, highlighting a higher degree of similarity between these cities, than Bologna and its fellow italian city Rome.Co-assemblies especially boost recovery of rare and undiscovered sewage taxaUtilizing a combination of co-assembly and single-sample assembly strategies, we successfully retrieved 2332 species of at least medium quality according to the MIMAG criteria suggested by Bowers et al.21. Many genomes lacked conserved RNAs but 56 were considered high quality. MAGs recovered through single-sample assembly were generally more abundant (26% of the genomes exceeding 1% abundance), whereas only about 2% of MAGs recovered through co-assembly had at least 1% abundance in any sample.Among the diverse orders represented in the genomic collection, Burkholderiales stands out with 410 different species. It is followed by the Pseudomonadales (148 species) and Bacteroidales (121 species) (Fig. 2a). However, we were only able to assign species annotation to 988 of the species suggesting a substantial number of novel taxa.Fig. 2: Thousands of species’ genomes recovered de novo from sewage.a Phylogeny of all highest-scoring members of species-level MAG clusters. From the inner to the outer ring, they show genome-assigned taxonomic order, the assembly method employed and whether a genome was classifiable at the species level. b A bar chart showing the number of classified and unclassified species from one or both assembly methods. The SHARED category refers to MAGs that were recovered using both the single-sample assembly (SSA) and the co-assembly approach (COASS). Classify refers to whether GTDB-tk assigned a species-level classification.Using the single-sample assembly approach, we managed to recover genomes from 925 species (Fig. 2b). The addition of co-assemblies significantly expanded the number of recovered species’ genomes with 1407, to 2332. Within this expansion, we identified 496 known species (Fig. 2b). Notably, some well-known species, like Lactobacillus acidophilus appeared only through co-assembly. Nineteen species could not be assigned to known orders, and intriguingly, 15 of those were exclusively obtained through the co-assembly process, indicating co-assembly can help recover substantially more novel taxonomy from sewage.Community detection in network graphs can assist source attributing genomes and AMRWe probed the co-occurrence of microbial taxa using network theory to identify their collective interactions. We analyzed the networks derived from each treatment plant separately, where each node (vertex) corresponds to a recovered genome and each edge (link) is a statistically significant co-occurrence of two species (Fig. 3a).Fig. 3: Diverse patterns emerge in the co-abundance networks of bacterial species across cities.a Each vertex corresponds to a species’ genome, and the links represent significant positive (blue) and negative (red) correlations. The size of the vertices is proportional to the mean CLR-transformed depth of coverage, and the shapes encode variance level. Vertex colors are used to highlight the community membership. For the vertex placement, we used the Fruchterman-Reingold layout algorithm on the subgraph composed from only positive edges. Following this representation, many of the blue/positive links are covered by the circles, even if there are more positive links overall, see Supplementary Table S1). b Heatmap of the Jaccard similarity (J) with the hierarchical clustering of the Jaccard dissimilarity (1-J). The index J is defined as the ratio of the intersection of the links shared between two networks with respect to the union. See Supplementary Fig. S4 for an analogous version with nodes colored by taxonomy.Due to the absence of a ground truth and the numerous species under investigation, including a significant proportion of unclassified taxa, we opted to use straightforward correlations to construct the network. While this simplification may overlook finer details22, we specifically utilize correlation values deemed significant to assign weights for the links in our network analysis. This includes both positive and negative correlations, as detailed in the methods section, providing a quantitative description of the collective properties.Our investigation unveiled striking disparities in network compositions and interactions, each manifesting a unique city-specific signature. The networks not only varied in the nodes/species constituents but also exhibited differing edge densities, as shown in Supplementary Table S1. The number of distinct species remaining after filtering low-abundance measurements exhibited substantial variation, ranging from 546 species in Rome to 854 species in both Budapest and Rotterdam (Supplementary Fig. S2). Even more rare are the significant relationships common to all sites, with just 36 shared links. Notably, these shared connections consistently belong to a bacterial community, primarily composed of bacteria from the Pseudomonas_E genus (Supplementary Fig. S3). This Pseudomonas_E group, consistently present in all the sewage samples, is what we refer to as the PEH (Pseudomonas_E Heavy) community.When we compare the sites in pairs, despite such variation, it becomes easier to capture common trends. Specifically, the three Copenhagen sites (RA, RD, and RL) displayed the highest degree of similarity among themselves, as substantiated by the Jaccard index computed on network edges (Fig. 4b). Equally intriguing was the notable overlap observed between the network structures of Bologna and Budapest, while Rotterdam exhibited a distinct profile distinct from the other cities. These observations are consistent with the beta-diversity analysis (Fig. 1c, d).Fig. 4: Association between microbiome-assigned antimicrobial resistance gene hits and microbial community abundances. a This scatter plot shows the inverse correlation between the fraction of sequence fragments attributed to PEH communities and the number of fragments assigned to ResFinder AMR in samples. Points are colored by site and sized according to the cumulative normalized genome depth per sample. b Heatmap depicting Spearman’s ρ between the number of AMR-gene-attributed fragments and PEH community depth. Highlighted within red square rectangles are communities with heavy PEH species contribution.We observed that networks consistently manifest a clear graph subdivision into distinct communities, as proved by their modularity values that always exceed 0.5 (Supplementary Table S1). This network partition can be seen in Fig. 4a, where bacteria cluster into discernible communities with strong internal connections rooted in only significant positive correlations.Certain communities functioned independently with minimal interconnections. Others displayed numerous external connections, primarily negative, indicating competitive interactions. This enabled us to cluster bacteria into distinct communities and investigate their composition and temporal trends. Clear illustrations are the PEH communities observed in Bologna_4, Budapest_5, Rome_2, Copenhagen_RL_3, Copenhagen_RA_4, Copenhagen_RD_3, and Rotterdam_5, which are also the rare, yet significant relationships shared across all sites (Supplementary Fig. S5).Furthermore, examining the abundance trends of these communities reveals intriguing behaviors. Some communities exhibit clear oscillations synchronized with the solar year, as we will delve into in the next paragraph. Others display remarkable shifts in their relative abundance and dominance within the sewage composition, transitioning from low abundance to becoming predominant, as observed in communities like Rotterdam_4.In the most extreme instances, PEH communities account for ~95% of the assigned reads. We find a powerful, inverse relationship between the abundance of those communities and the observations of antimicrobial resistance genes (Fig. 5a). The ratio between those two variables can vary by more than 1000, suggesting environmental microbes can completely overpower surveillance targets in some cities.Fig. 5: Human gut microbiome species form communities and correlate with crAssphage abundance. a Certain sampling sites show communities dominated by gut microbes. Light blue indicates the number of MAGs classified as human gut microbes, dark blue represents the number of MAGs not identified at the species level, and light green represents the number of other species in each community. Communities with fewer than 10 members were excluded from the analysis. The percentage values indicate the proportion of human gut microbes within each community. b Relative abundance of crAssphage (BK010471), an indicator of human fecal contamination shows low median level of fecal contamination in Rome but similarly high level of median abundance at other sites. The boxplot hinges represent the 25th and 75th percentiles, with the median indicated by a line inside the box. The whiskers extend to 1.5 times the IQR. c Spearman’s ρ between sample crAssphage depth and CLR-transformed depths of bacterial communities.The three Copenhagen PEH communities share most features, like having noticeable Janthinobacterium tructae participation absent from other communities (yellow). But they also show within-city differences, like the Copenhagen_RA community uniquely being associated with an unclassified Giesbergeria species (purple).We correlated the AMR measures in samples with the abundance of bacterial communities and found it strongly anti-correlated specifically to the PEH communities in the three heavily affected cities (Fig. 5b).In an attempt to discover the drivers of Pseudomonas_E blooms, we associated the proportion of Pseudomonas_E to pH and temperature in the affected sites (Supplementary Fig. S6). There were no clear associations between those variables. The species-level beta-diversity was partly explained by temperature (8%, P = 0.01) but not by pH (1.4%, P = 0.175).Metabolic collaboration does not explain the observed sewage microbial communitiesIn an attempt to see if metabolic collaboration drove community formation, we investigated the degree to which the communities (or shuffled groups of the same size) encoded the proteome required for metabolic functionality.Metabolism of most amino acids and central carbon metabolism, was common among all communities (including original and shuffled) and sites, while others, like plant pathogenicity or sterol biosynthesis, were completely absent. Sulfur biosynthesis, and nitrogen metabolism were more prevalent in some communities while absent in others, including shuffled ones, suggesting that these traits might be specific to a few bacteria rather than a collaborative effort among bacteria in the community.As the community size grows, different metabolic categories tend to be more complete. However, there are some exceptions. For instance, Copenhagen_RL_15, the smallest community in Copenhagen_RL with only two bacteria, still has complete histidine metabolism and sulfur metabolism. One of these bacteria, the Thiotrix unzii is well-documented for its involvement in sulfur metabolism23. In case of nitrogen metabolism, Rotterdam_7 is the only one which seems to have all nitrogen metabolism modules. This community consists of 22 bacteria most of them are not classified at the species level. Among them, one MAG is classified as the genus Nitrospira_F, which is known for its nitrification capabilities24. For a breakdown of community metabolic differentiation, see Supplementary Fig. S7.Human gut microbiota cluster into fecal communitiesYet another explanation could come from the fact that sewage is a mixed environment with multiple input sources. When these vary in relative contribution, all organisms from the same source will co-occur also leading to groups of genomes co-occurring through time. In essence, our communities would then be multi-genome bins, exploiting differential abundance in the same way as genome binning tools can, but with weaker associations.If e.g., the amount of environmental bacterial contribution varies significantly through time, we would expect those genomes to group together and differently from human microbes.By comparing our classified species to known human gut microbiota species. Out of 998 MAGs classified at the species-level 241 potentially originated from the human gut. We observed that these MAGs tend to form communities, Fig. 5a. We noted that certain sampling sites exhibited communities dominated by gut microbes, such as communities 5 and 6 in Bologna, community 4 in Budapest, community 5 at Copenhagen RA, and community 2 in Rotterdam. Interestingly, in Rome, gut microbes are rare in the communities. Most sites have a single community with more than 2/3 MAGs being recognized as human fecal microbiota.To further confirm this finding, we used crAssphage as an indicator for human fecal contamination (see Methods). With the exception of Rome, which exhibited low levels (corresponding with a lower count of human gut microbiota species within the communities), all cities showed similarly high median abundances of crAssphage, Fig. 5b. By quantifying its abundance, a pronounced correlation emerged between crAssphage levels and the dominance of fecal bacteria within communities, Fig. 5c. This is particularly evident in Bologna communities 5 and 6 (with Spearman’s correlation of 0.81 and 0.45), community 5 in Budapest (0.66), community 5 in Copenhagen RA (0.49), and community 2 in Rotterdam (0.76).In addition, we quantified human mitochondria and used those numbers to confirm that crAssphage abundance was highly correlated with human-associated DNA (Supplementary Fig. S8). Since crAssphage is an established fecal indicator with more aligned reads than mitochondria, it was expected to be associated with less stochastic sampling noise, and we chose to use it further in our analyses.Community detection for assigning AMR to sourcesLooking at Fig. 4b, we saw several communities that are strongly positively correlated with AMR. However, these were not necessarily the fecal communities. We therefore sought to include the ARGs in the per-site network analysis and recomputed the networks with both ARG and MAG features included and correlated. While correlation alone might be too uncertain to ascertain the source genome, an overall origin based on community might be feasible.Using such an approach, Bologna, Budapest, and Rotterdam formed a fecal community each with ARGs and at least 70% of classified species being known human gut species.The Rotterdam fecal community was associated with 10 ARGs, mostly tet(W)/(O)/(32) alleles which are known to recombine, aph(3’)-III, blaACI and cfr(C). The Bologna fecal community contained 18 genes with many recognized Proteobacteria ARGs like sul(1), sul(2), qnrS, tet(X), tet(W), and aadA alleles. Lastly, the Budapest fecal community included 22 ARGs, overlapping with some of the others, but also with unique inclusions like mph(N), tet(44), and erm(B).In the Copenhagen sites, there was a tendency of large ARG-only communities forming, not associated with any species in particular (Supplementary Fig. S9). Whether this is due to plasmid carriage or other sources remains to be explored.Some bacterial communities closely follow the solar yearTo better visualize and understand the dynamics between the communities, we explored the temporal trends of microbial communities across multiple sites (Fig. 6). As anticipated, the network structures unveiled complex interrelationships among these communities, reflecting significant changes in sewage composition over time. Notably, we identified communities characterized by strong periodic trends that exhibited clear negative correlations with other communities. In contrast, certain communities experienced a notable longer-term abundance change, going from the least to the most abundant community in a >1 year sampling period, or reversely, changing from dominant to insignificant.Fig. 6: Time abundance of principal bacterial communities at each site.The figure shows the smoothed (LOESS) trends of the CLR-transformed depths of bacterial communities, colored as in Fig. 3. The horizontal black dashed lines indicate where CLR-transformed depths are zero, highlighting the geometric means of the samples serving as the reference in CLR space. Only the communities composed of 30 or more bacterial species/MAG are included in the figures. Communities displaying periodic behavior aligned with the solar year are marked with dashed lines.We quantified the time periods of the “periodic” bacterial communities. Given that our sampling campaign lengths frequently covered only a single oscillation per community, we opted to simply estimate the frequency by fitting sine functions. Notably, we identified a subset of communities, especially in Copenhagen, Bologna, and Rotterdam, which displayed periodic behaviors closely aligned with the solar year (Supplementary Fig. S10). The community most closely following the solar year was Copenhagen_RA_2 (364,77 + /− 9.48 days). This community contains Acidovorax defluvii, Trichococcus flocculiformis and 145 other species, most of which have not been described before.Intriguingly, in Rotterdam and across the three Copenhagen sites, we discovered two annual communities that were markedly negatively correlated with each other. These are visually represented by dashed lines in Fig. 6. In addition, Supplementary Fig. S11 provides another perspective by showing the trends in beta-diversity per site over time. We observed a notable divergence in the bacterial composition of sewage at sampling sites over time, with samples becoming increasingly dissimilar, following the time between collection dates (Δ days) (Supplementary Fig. S11b). Budapest, though, emerged as an exception as its sewage composition exhibited relative stability over the course of a year. In half of the sampling sites, there is a detectable decrease in beta-diversity after one year. This particular phenomenon was absent in Bologna and in Rome (Supplementary Fig. S11a). Excluding the dominating Pseudomonas_E taxon made little difference, demonstrating that the seasonal findings are separate (Supplementary Fig. S11c). Although some sampling sites exhibited periodic patterns aligned with the solar year, others did not; the diagrams at the community level reveal more nuanced patterns within each site, distinguishing between communities with seasonal behaviors from those without.

Hot Topics

Related Articles