A unified framework to analyze transposable element insertion polymorphisms using graph genomes

Wells, J. N. & Feschotte, C. A field guide to eukaryotic transposable elements. Annu. Rev. Genet. 54, 539–561 (2020).Article 
CAS 

Google Scholar 
Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, 199 (2018).Article 
CAS 

Google Scholar 
Chandler, M., Gellert, M., Lambowitz, A. M., Rice, P. A. & Sandmeyer, S. B. Mobile DNA III (John Wiley & Sons, 2020).Deniz, Ö., Frost, J. M. & Branco, M. R. Regulation of transposable elements by DNA modifications. Nat. Rev. Genet. 20, 417–431 (2019).Article 
CAS 

Google Scholar 
Bourgeois, Y. & Boissinot, S. On the population dynamics of junk: a review on the population genomics of transposable elements. Genes. 10, 419 (2019).Article 
CAS 

Google Scholar 
Gardner, E. J. et al. The mobile element locator tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017).Article 
CAS 

Google Scholar 
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).Article 
CAS 

Google Scholar 
Rech, G. E. et al. Population-scale long-read sequencing uncovers transposable elements associated with gene expression variation and adaptive signatures in Drosophila. Nat. Commun. 13, 1948 (2022).Article 
ADS 
CAS 

Google Scholar 
Watkins, W. S. et al. The Simons Genome Diversity Project: a global analysis of mobile element diversity. Genome Biol. Evol. 12, 779–794 (2020).Article 
CAS 

Google Scholar 
Goubert, C. et al. High-throughput sequencing of transposable element insertions suggests adaptive evolution of the invasive Asian tiger mosquito towards temperate environments. Mol. Ecol. 26, 3968–3981 (2017).Article 
CAS 

Google Scholar 
Lerat, E. et al. Population-specific dynamics and selection patterns of transposable element insertions in European natural populations. Mol. Ecol. 28, 1506–1522 (2019).Article 
CAS 

Google Scholar 
Li, Z.-W. et al. Transposable elements contribute to the adaptation of Arabidopsis thaliana. Genome Biol. Evol. 10, 2140–2150 (2018).Article 
CAS 

Google Scholar 
Rech, G. E. et al. Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila. PLoS Genet. 15, e1007900 (2019).Article 
CAS 

Google Scholar 
Van’t Hof, A. E. et al. The industrial melanism mutation in British peppered moths is a transposable element. Nature 534, 102–105 (2016).Article 
ADS 

Google Scholar 
Payer, L. M. & Burns, K. H. Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019).Article 
CAS 

Google Scholar 
Lisch, D. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2012).Article 

Google Scholar 
Goubert, C., Zevallos, N. A. & Feschotte, C. Contribution of unfixed transposable element insertions to human regulatory variation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190331 (2020).Article 
CAS 

Google Scholar 
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).Article 
CAS 

Google Scholar 
Chen, X., Bourque, G. & Goubert, C. Genotyping of transposable element insertions segregating in human populations using short-read realignments. Methods Mol. Biol. 2607, 63–83 (2023).Article 
CAS 

Google Scholar 
Rajaby, R. & Sung, W.-K. TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data. Nucleic Acids Res. 46, e122 (2018).
Google Scholar 
Chen, X. & Li, D. ERVcaller: identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data. Bioinformatics 35, 3913–3922 (2019).Article 
CAS 

Google Scholar 
Kojima, S. et al. Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk. Nat. Genet. 55, 939–951 (2023).Bogaerts-Márquez, M. et al. T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data. Bioinformatics 36, 1191–1197 (2020).Article 

Google Scholar 
Chen, J. et al. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob. DNA 14, 8 (2023).Article 

Google Scholar 
Yu, T. et al. A benchmark and an algorithm for detecting germline transposon insertions and measuring de novo transposon insertion frequencies. Nucleic Acids Res. 49, e44 (2021).Article 
CAS 

Google Scholar 
Rahman, R. et al. Unique transposon landscapes are pervasive across Drosophila melanogaster genomes. Nucleic Acids Res. 43, 10655–10672 (2015).Article 
CAS 

Google Scholar 
Kofler, R., Gómez-Sánchez, D. & Schlötterer, C. PoPoolationTE2: comparative population genomics of transposable elements using Pool-Seq. Mol. Biol. Evol. 33, 2759–2764 (2016).Article 
CAS 

Google Scholar 
Ewing, A. D. et al. Nanopore sequencing enables comprehensive transposable element epigenomic profiling. Mol. Cell 80, 915–928.e5 (2020).Article 
CAS 

Google Scholar 
Han, S. et al. Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line. Nucleic Acids Res. 50, e124 (2022).Article 
CAS 

Google Scholar 
Mohamed, M. et al. A transposon story: from TE content to TE dynamic invasion of Drosophila genomes using the single-molecule sequencing technology from Oxford Nanopore. Cells 9, 1776 (2020).Zhou, W. et al. Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology. Nucleic Acids Res. 48, 1146–1163 (2020).Article 
CAS 

Google Scholar 
Chu, C. et al. Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat. Commun. 12, 3836 (2021).Article 
ADS 
CAS 

Google Scholar 
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022).Article 
ADS 
CAS 

Google Scholar 
Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).Article 
ADS 
CAS 

Google Scholar 
Shang, L. et al. A super pan-genomic landscape of rice. Cell Res. 32, 878–896 (2022).Article 
CAS 

Google Scholar 
Ruggieri, A. A. et al. A butterfly pan-genome reveals that a large amount of structural variation underlies the evolution of chromatin accessibility. Genome Res. 32, 1862–1875 (2022).Article 

Google Scholar 
Vernikos, G. S. A review of pangenome tools and recent studies. in The Pangenome: Diversity, Dynamics and Evolution of Genomes (ed Tettelin, H. et. al.) 89–112 (Springer International Publishing, 2020) https://doi.org/10.1007/978-3-030-38281-0_4.Groza, C. et al. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat. Commun. 15, 657 (2024).Li, R. et al. A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes. Genome Res. 33, 463–477 (2023).Article 
CAS 

Google Scholar 
Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).Article 
CAS 

Google Scholar 
Gupta, P. K. GWAS for genetics of complex quantitative traits: genome to pangenome and SNPs to SVs and k-mers. Bioessays 43, e2100109 (2021).Article 

Google Scholar 
Rakocevic, G. et al. Fast and accurate genomic analyses using genome graphs. Nat. Genet. 51, 354–362 (2019).Article 
CAS 

Google Scholar 
Qi, W. et al. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. Gigascience 11, giac028 (2022).Groza, C. et al. Genome graphs detect human polymorphisms in active epigenomic state during influenza infection. Cell Genom. 3, 100294 (2023).Article 
CAS 

Google Scholar 
Groza, C., Kwan, T., Soranzo, N., Pastinen, T. & Bourque, G. Personalized and graph genomes reveal missing signal in epigenomic data. Genome Biol. 21, 124 (2020).Article 
CAS 

Google Scholar 
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).Article 

Google Scholar 
Kurtzer, G. M., Bauer, M., Kaneshiro, I., Trudgian, D. & Godlove, D. hpcng/singularity: Singularity 3.7.3. https://doi.org/10.5281/zenodo.4667718 (2021).Liu, Y. H., Luo, C., Golding, S. G., Ioffe, J. B. & Zhou, X. M. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat. Commun. 15, 2447 (2024).Article 
ADS 
CAS 

Google Scholar 
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).Munasinghe, M. et al. Combined analysis of transposable elements and structural variation in maize genomes reveals genome contraction outpaces expansion. PLOS Genetics 19, e1011086 (2023).Stitzer, M. C., Anderson, S. N., Springer, N. M. & Ross-Ibarra, J. The genomic ecosystem of transposable elements in maize. PLoS Genet. 17, e1009768 (2021).Article 
CAS 

Google Scholar 
Kojima, S. et al. Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk. Nat. Genet. https://doi.org/10.1038/s41588-023-01390-2 (2023).Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).Article 
CAS 

Google Scholar 
Rautiainen, M. & Marschall, T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 21, 253 (2020).Article 

Google Scholar 
Hickey, G. et al. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biol. 21, 35 (2020).Article 

Google Scholar 
Meyer, T. J., Srikanta, D., Conlin, E. M. & Batzer, M. A. Heads or tails: L1 insertion-associated 5’ homopolymeric sequences. Mob. DNA 1, 7 (2010).Article 

Google Scholar 
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).Article 
ADS 
CAS 

Google Scholar 
Kapun, M. et al. Genomic analysis of European Drosophila melanogaster populations reveals longitudinal structure, continent-wide selection, and previously unknown DNA viruses. Mol. Biol. Evol. 37, 2661–2678 (2020).Article 
CAS 

Google Scholar 
Jain, C. et al. Weighted minimizer sampling improves long read mapping. Bioinformatics 36, i111–i118 (2020).Article 
CAS 

Google Scholar 
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).Article 
ADS 
CAS 

Google Scholar 
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).Article 
CAS 

Google Scholar 
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).Article 
CAS 

Google Scholar 
Wang, Q. & Dooner, H. K. Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proc. Natl. Acad. Sci. USA 103, 17644–17649 (2006).Article 
ADS 
CAS 

Google Scholar 
Sirangelo, T. M., Ludlow, R. A. & Spadafora, N. D. Multi-omics approaches to study molecular mechanisms in Cannabis sativa. Plants 11, 2182 (2022).Gao, S. et al. A high-quality reference genome of wild Cannabis sativa. Hortic. Res. 7, 73 (2020).Article 

Google Scholar 
Pisupati, R., Vergara, D. & Kane, N. C. Diversity and evolution of the repetitive genomic content in Cannabis sativa. BMC Genom.19, 156 (2018).Article 

Google Scholar 
Haapa-Paananen, S., Wahlberg, N. & Savilahti, H. Phylogenetic analysis of Maverick/Polinton giant transposons across organisms. Mol. Phylogenet. Evol. 78, 271–274 (2014).Article 

Google Scholar 
Mohamed, M. et al. TrEMOLO: accurate transposable element allele frequency estimation using long-read sequencing data combining assembly and mapping-based approaches. Genome Biol. 24, 63 (2023).Article 

Google Scholar 
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).Article 
CAS 

Google Scholar 
Billingsley, K., Thomas, J. & Goubert, C. Transposable element structural variants in Parkinson’s disease: focusing on genotyping Alu transposable element insertions with TypeTE. in Neuromethods 43–62 (Springer US, New York, NY, 2022).Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).Article 
ADS 
CAS 

Google Scholar 
Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PLoS ONE 6, e16526 (2011).Article 
ADS 
CAS 

Google Scholar 
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).Article 
CAS 

Google Scholar 
Sierra, P. & Durbin, R. Identification of transposable element families from pangenome polymorphisms. Mobile DNA 15, 13 (2024).Bailly-Bechet, M., Haudry, A. & Lerat, E. ‘One code to find them all’: a perl tool to conveniently parse RepeatMasker output files. Mob. DNA 5, 1–15 (2014).Article 

Google Scholar 
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. www.repeatmasker.org (2013).Ostertag, E. M. & Kazazian, H. H. Jr. Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res. 11, 2059–2065 (2001).Article 
CAS 

Google Scholar 
Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).Article 

Google Scholar 
Yue, J.-X. & Liti, G. simuG: a general-purpose genome simulator. Bioinformatics 35, 4442–4444 (2019).Article 
CAS 

Google Scholar 
Ono, Y., Hamada, M. & Asai, K. PBSIM3: a simulator for all types of PacBio and ONT long reads. NAR Genom. Bioinform. 4, 4 (2022).
Google Scholar 
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021).Article 
CAS 

Google Scholar 
Hall, M. Rasusa: Randomly subsample sequencing reads to a specified coverage. J. Open Source Softw. 7, 3941 (2022).Article 
ADS 

Google Scholar 
Smolka, M. et al. Comprehensive structural variant detection: from mosaic to population-level. Prepint at bioRxiv https://doi.org/10.1101/2022.04.04.487055 (2022).Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).Article 
ADS 
CAS 

Google Scholar 
Thioulouse, J. et al. Multivariate Analysis of Ecological Data with ade4 (Springer, 2018).Gower, J. C. & Legendre, P. Metric and Euclidean properties of dissimilarity coefficients. J. Classif. 3, 5–48 (1986).Article 
MathSciNet 

Google Scholar 
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).Article 

Google Scholar 
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).Article 
CAS 

Google Scholar 
Groza, C., Chen, X., Wheeler, T., Bourque, G. & Goubert, C. A unified framework to analyze transposable element insertion polymorphisms using graph genomes (this paper). cgroza/GraffiTE https://doi.org/10.5281/zenodo.12538787 (2024).Groza, C., Chen, X., Wheeler, T., Bourque, G. & Goubert, C. A unified framework to analyze transposable element insertion polymorphisms using graph genomes. Zenodo https://doi.org/10.5281/zenodo.11391567 (2024).Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).Article 
CAS 

Google Scholar 
Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 5519–5521 (2020).Article 
CAS 

Google Scholar 
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).Article 

Google Scholar 
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 1–11 (2017).Article 

Google Scholar 
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 1–9 (2009).Article 

Google Scholar 
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).Article 
CAS 

Google Scholar 
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).Article 
CAS 

Google Scholar 
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).Article 

Google Scholar 

Hot Topics

Related Articles