Improved high quality sand fly assemblies enabled by ultra low input long read sequencing

Phlebotomine sand flies (family Psychodidae, order Diptera) include several genera of hematophagous arthropods that vector important emerging and re-emerging infectious diseases. They transmit bacterial, viral, and, most notably, the protozoan pathogen Leishmania, to humans and animals. Leishmaniasis is a group of diseases that range in clinical manifestation, from self-healing cutaneous lesions to disfiguring mucocutaneous ulcers to fatal visceral disease. Clinical tropisms can be highly dependent on infective species and vectoring sand fly. Over 90 species of sand flies found across Latin America, Africa, the eastern Mediterranean, Southeast Asia, and Europe have been implicated as vectors for approximately 20 species of Leishmania parasites that cause leishmaniasis1,2.Phlebotomus papatasi vectors Leishmania major, an etiological agent of cutaneous leishmaniasis, across North Africa, the Middle East, and the Indian subcontinent3. It is a restrictive vector in that it can only transmit a single Leishmania species, Le. major. However, P. papatasi also transmits viral febrile illnesses across its distribution4,5. Lutzomyia longipalpis is the major vector responsible for transmission of the visceral leishmaniasis causing parasite, Leishmania infantum, in the Americas6. Lu. longipalpis is a permissive vector in the laboratory, transmitting several Leishmania species, however in nature it only transmits Le. infantum7. Lu. longipalpis has a wide geographic distribution inhabiting a range of diverse ecological habitats and has garnered interest as a species complex. Others have observed differences in spot numbers, pheromones, mating songs, and noted reproductive isolation between different populations collected throughout Brazil8. Leishmaniasis pathogenesis is thought to be dependent on complex host, vector, and parasite interactions and, although the epidemiological implications of a Lu. longipalpis species complex remain unclear, understanding the molecular underpinnings that that lead to vector competence, reproductive isolation and adaptation is critical from an epidemiological and disease control perspective.In mosquito research, high-quality reference genomes have enabled inquiries into population genetics and metagenomics, identification of gene markers of senescence, vector competence, insecticide resistance, and experimental gene drive approaches to vector control. These have ultimately improved understanding and management of the vector in the disease transmission cycle9. Unfortunately, the fragmented nature of current sand fly references slowed similar inquiries for Leishmania transmission.Previous reference genomes for P. papatasi and Lu. longipalpis10 suffered very low contiguity. Using the best sequencing technology at the time, read lengths were limited to ~400 bp – too short to span many repeats. More damaging to assembly contiguity, previous library protocol DNA input minimums required DNA to be pooled from many individuals, inserting many different haplotypes into the assembly algorithm. Genome heterozygosity could not be controlled for by inbreeding in sand flies, and haplotype sequence variation – for example, a short insertion polymorphism – caused assembly tools designed for a single haplotype to create sequence gaps in areas of uncertainty. Together, these constraints led the genome assemblies for P. papatasi and Lu. longipalpis to be the 2nd and 3rd worst available in VectorBase11, with contig N50 lengths at 5,795 bp and 7,481 bp, respectively. For reference, across all genomes in VectorBase at the time, the median assembly contig N50 was 51,691 bp. Additionally, no Hi-C or chromosome scale data was available, and these fragmented genome assemblies were inadequate for many genome analyses.Here, we update these two important sand fly vector genome references leveraging a decade’s worth of technological advances. Specifically, very high quality long read sequences of Q20 or even Q30 are available in lengths longer than the previous assemblies contigs. Second, Hi-C technologies have become de rigueur and have higher chromosomal completion rates when paired with the significantly longer contigs generated by high quality long read assembly. Finally, an ultra-low input library protocol developed by Pacific Biosciences12 enabled the sequencing of a single individual sand fly. This greatly simplified assembly of sequence information from only 2 haplotypes derived from a single individual rather than many haplotypes from a pool of individuals. A small compromise, as only 30 ng of genomic DNA can be isolated from a single sand fly male, is the use of whole genome amplification. Together these three techniques have generated the greatly improved reference assemblies we describe here.

Hot Topics

Related Articles