Problems in the assembly and analysis of the kangaroopox virus-NSW isolate

For the first step in our analysis, using the default parameters of MAFFT, we aligned the EKPV-NSW, EKPV-SC and WKPV-WA genome sequences in the Base-By-Base (BBB)6 software package. Comparing EKPV-NSW to EKPV-SC and WKPV-WA, the tool calculated the percent nucleotide identities to be 98.8 and 96.0, respectively. Our values (1.2 and 4.0) for percent difference (EKPV-NSW to EKPV-SC and WKPV-WA) are much less than those calculated by Sarker et al. (8.5 and 12.1% difference). One possible source of this discrepancy (98.8% vs 91.5% identity) for the comparison of EKPV-NSW and EKPV-SC, may be that the tool used by Sarker et al. counted alignment columns containing a gap in the calculation of identity. Further evidence of problems with their calculations of sequence identity can be found in Sarker et al. Fig. 3B, in this table, many of the pairwise comparisons are < 15% nucleotide identity; random DNA sequences (4 nucleotides) should be approximately 25% identical.As also identified by Sarker et al., we found that there were very significant differences at the termini of the EKPV-NSW and EKPV-SC genomes and within 8 blocks (Fig. 1) scattered along the genome. Most of these blocks result from large indels in one or other of the EKPV genomes, but 2 blocks are regions of almost no similarity (Fig. 1D, H). After we removed the terminal regions and these 2 unusual blocks, which account for only 5.1% of the genome lengths, from the multiple sequence alignment (MSA), we found that the EKPV-NSW and EKPV-SC genomes were actually 99.8% identical and these were 96.2 and 96.4% identical to the WKPV-WA genome sequence, respectively. Thus, our final results indicate that the EKPV-NSW and EKPV-SC genomes have diverged by only 0.2%, except for two relatively small misaligning regions, and not by 8.5% as reported by Sarker et al.Figure 1The Visual Summary output of BBB after aligning EKPV-NSW and EKPV-SC. Indels and SNPs are shown by vertical lines. The large blocks of indels and misalignments are labeled (A–H).A review of the variable regions revealed simple explanations for their lack of similarity. In examining the relationships between the large indels and misalignments (Fig. 1A–H), it is important to note that for each of the differences between EKPV-SC and EKPV-NSW, EKPV-SC agrees with WKPV-WA, which is a different species of kangaroopox virus. Blocks A and B (Fig. 2, panel a): these blocks are essentially identical except the block is translocated in EKPV-NSW; Block C (Fig. 2, panel b): this block represents a repeat in EKPV-NSW that is made up of 2 other regions that are also present in the same position as EKPV-SC; Block D represents the sequence within the DNA polymerase gene that has been replaced by MOCV sequence (discussed below); Blocks E and F (Fig. 2, panel c): block F represents the translocation of block E from EKPV-SC and replacement of the block marked with an asterisk; Block G represents a region absent from EKPV-NSW; Block H (Fig. 2, panel d): this block represents a repeat in EKPV-NSW that is made up of 2 other regions that are also present in the same position as EKPV-SC, the translocated sequence replaces the block marked with an asterisk. Many of the genes associated with these regions are either truncated or missing in the EKPV-NSW annotation; these include the orthologs of F13L, E9L, G5R, G7L, L3L, H2R, H3L, A14.5L, A16L, and A18R (vaccinia virus Copenhagen nomenclature), which are conserved in all other chordopoxviruses. It is therefore most improbable that such gene losses are survivable for any chordopoxvirus.Figure 2Origin of rearrangements in EKPV-NSW. Vertical grey lines spanning genomes indicate areas that align with > 99% identity. Specific genome regions are indicated by shaded blocks positioned on genomes (a,b); those with dashed borders indicate regions that have been moved (according to arrows). Block names and genome numbers are consistent with those shown in Fig. 1. In (c,d), asterisks indicate sequence blocks in EKPV-SC that are forced to align with blocks F and H, respectively by the alignment tools. However, the regions of EKPV-SC indicated by the asterisks do not have counterparts in the EKPV-NSW assembly and therefore we have indicated gaps opposite these regions as well as blocks F and G.

Hot Topics

Related Articles