A multi-modal deep language model for contaminant removal from metagenome-assembled genomes

Bernard, G., Pathmanathan, J. S., Lannes, R., Lopez, P. & Bapteste, E. Microbial dark matter investigations: how microbial studies transform biological knowledge and empirically sketch a logic of scientific discovery. Genome Biol. Evol. 10, 707–715 (2018).Article 

Google Scholar 
Dam, H. T., Vollmers, J., Sobol, M. S., Cabezas, A. & Kaster, A.-K. Targeted cell sorting combined with single cell genomics captures low abundant microbial dark matter with higher sensitivity than metagenomics. Front. Microbiol. 11, 1377 (2020).Article 

Google Scholar 
Kaster, A.-K. & Sobol, M. S. Microbial single-cell omics: the crux of the matter. Appl. Microbiol. Biotechnol. 104, 8209–8220 (2020).Article 

Google Scholar 
Pratscher, J., Vollmers, J., Wiegand, S., Dumont, M. G. & Kaster, A.-K. Unravelling the identity, metabolic potential and global biogeography of the atmospheric methane-oxidizing upland soil cluster α. Environ. Microbiol. 20, 1016–1029 (2018).Article 

Google Scholar 
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaspades: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).Article 

Google Scholar 
Liang, K.-C. & Sakakibara, Y. Metavelvet-dl: a metavelvet deep learning extension for de novo metagenome assembly. BMC Bioinforma. 22, 427 (2021).Article 

Google Scholar 
Kolmogorov, M. et al. metaflye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).Article 

Google Scholar 
Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 39, 555–560 (2021).Article 

Google Scholar 
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).Article 

Google Scholar 
Wu, Y.-W., Tang, Y.-H., Tringe, S. G., Simmons, B. A. & Singer, S. W. Maxbin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).Article 

Google Scholar 
Kang, D. D. et al. Metabat 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).Article 

Google Scholar 
Vollmers, J., Wiegand, S. & Kaster, A.-K. Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective-not only size matters! PLoS ONE 12, e0169662 (2017).Article 

Google Scholar 
Nayfach, S. et al. A genomic catalog of earth’s microbiomes. Nat. Biotechnol. 39, 499–509 (2021).Article 

Google Scholar 
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).Article 

Google Scholar 
Jennifer Mattock, M. W. A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination. Nat. Methods 20, 1170–1173 (2023).Article 

Google Scholar 
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).Article 

Google Scholar 
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).Article 

Google Scholar 
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).Article 

Google Scholar 
Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).Article 

Google Scholar 
Vollmers, J., Wiegand, S., Lenk, F. & Kaster, A.-K. How clear is our current view on microbial dark matter? (Re-) Assessing public MAG & SAG datasets with MDMcleaner. Nucleic Acids Res. 50, e76–e76 (2022).Article 

Google Scholar 
Drillon, G., Champeimont, R., Oteri, F., Fischer, G. & Carbone, A. Phylogenetic reconstruction based on synteny block and gene adjacencies. Mol. Biol. Evol. 37, 2747–2762 (2020).Article 

Google Scholar 
Periwal, V. & Scaria, V. Insights into structural variations and genome rearrangements in prokaryotic genomes. Bioinformatics 31, 1–9 (2015).Article 

Google Scholar 
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).Article 

Google Scholar 
Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 22, 178 (2021).Article 

Google Scholar 
Pan, S., Zhao, X.-M. & Coelho, L. P. Semibin2: self-supervised contrastive learning leads to better mags for short-and long-read sequencing. Bioinformatics 39, i21–i29 (2023).Article 

Google Scholar 
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning 139, 8748–8763 (PMLR, 2021).Wagstaff, K. et al. Constrained k-means clustering with background knowledge. In Proc. 18th International Conference on Machine Learning 1, 577–584 (Morgan Kaufmann, 2001).Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2 a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods 20, 1203–1212 (2023).Article 

Google Scholar 
Ma, B. et al. A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources. Nat. Commun. 14, 7318 (2023).Article 

Google Scholar 
Duncan, A. et al. Metagenome-assembled genomes of phytoplankton microbiomes from the arctic and atlantic oceans. Microbiome 10, 67 (2022).Article 

Google Scholar 
Faist, H. et al. Potato root-associated microbiomes adapt to combined water and nutrient limitation and have a plant genotype-specific role for plant stress mitigation. Environ. Microbiome 18, 18 (2023).Article 

Google Scholar 
Tláskal, V. et al. Metagenomes, metatranscriptomes and microbiomes of naturally decomposing deadwood. Sci. Data 8, 198 (2021).Article 

Google Scholar 
Buck, M. et al. Comprehensive dataset of shotgun metagenomes from oxygen stratified freshwater lakes and ponds. Sci. Data 8, 131 (2021).Article 

Google Scholar 
Kavagutti, V. S. et al. High-resolution metagenomic reconstruction of the freshwater spring bloom. Microbiome 11, 15 (2023).Article 

Google Scholar 
Maestre-Carballa, L., Navarro-López, V. & Martinez-Garcia, M. City-scale monitoring of antibiotic resistance genes by digital pcr and metagenomics. Environ. Microbiome 19, 16 (2024).Article 

Google Scholar 
Zhao, L. et al. A clostridia-rich microbiota enhances bile acid excretion in diarrhea-predominant irritable bowel syndrome. J. Clin. Invest. 130, 438–450 (2020).Article 

Google Scholar 
Rodriguez-R, L. M. & Konstantinidis, K. T. Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics 30, 629–635 (2014).Article 

Google Scholar 
Lai, S. et al. metamic: reference-free misassembly identification and correction of de novo metagenomic assemblies. Genome Biol. 23, 242 (2022).Article 

Google Scholar 
Derakhshani, H., Bernier, S. P., Marko, V. A. & Surette, M. G. Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools. BMC Genomics 21, 519 (2020).Article 

Google Scholar 
Mende, D. R. et al. progenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Res. 48, D621–D625 (2020).
Google Scholar 
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).Article 

Google Scholar 
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. Megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph. Bioinformatics 31, 1674–1676 (2015).Article 

Google Scholar 
Li, K. et al. Uniformer: unified transformer for efficient spatiotemporal representation learning. Preprint at https://doi.org/10.48550/arXiv.2201.04676 (2022).Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).Article 

Google Scholar 
Tan, M. & Le, Q. Efficientnet: rethinking model scaling for convolutional neural networks. In Proc. 36th International Conference on Machine Learning 97, 6105–6114 (PMLR, 2019).Li, C., Zhou, A. & Yao, A. Omni-dimensional dynamic convolution. Preprint at https://doi.org/10.48550/arXiv.2209.07947 (2022).Guo, M.-H., Lu, C.-Z., Liu, Z.-N., Cheng, M.-M. & Hu, S.-M. Visual attention network. Comput. Vis. Media 9, 733–752 (2023).Article 

Google Scholar 
Wang, H. et al. Deepnet: scaling transformers to 1,000 layers. IEEE Trans. Pattern Anal. Mach. Intell. 46, 6761–6774 (2024).Article 

Google Scholar 
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. Preprint at https://doi.org/10.48550/arXiv.1708.02002 (2018).Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).Article 

Google Scholar 
Zou, B. Deepurify: a multi-modal deep language model to remove contamination from metagenome-assembled genomes. Simulation 1, v.1. Zenodo https://doi.org/10.5281/zenodo.8343497 (2023).Zou, B. Deepurify: a multi-modal deep language model to remove contamination from metagenome-assembled genomes. Simulation 2, v.2. Zenodo https://doi.org/10.5281/zenodo.8343505 (2024).Zou, B. A deep multi-modal deep language model for contaminant removal from metagenome-assembled genomes (code). Zenodo https://doi.org/10.5281/zenodo.11919065 (2024).

Hot Topics

Related Articles