Longitudinal genomic analysis of Neisseria gonorrhoeae transmission dynamics in Australia

Setting, patients and data sourcesIn Australia, all gonorrhoea cases are notified to public health authorities in each State or Territory. All N. gonorrhoeae isolated in Victoria from diagnostic laboratories were forwarded to the state Microbiological Diagnostic Unit Public Health Laboratory (MDU PHL) for AMR surveillance purposes. Between 1 January 2017 to 31 December 2017 and 1 July 2019 to 30 June 2021 (inclusive), all forwarded isolates underwent antimicrobial susceptibility testing by agar dilution and WGS. Between 1st January 2018 and 30th June 2019 (inclusive) routine WGS was not conducted, and no isolates were sequenced. Information on gonorrhoea notifications in Victoria was obtained from the National Notifiable Diseases Surveillance System (NNDSS)1.Samples included those collected from asymptomatic and symptomatic adult patients ( ≥ 16 years), and from genital and extra-genital (oropharyngeal and rectal) sites. Where multiple N. gonorrheoae genomes were isolated from the same individual on the same visit, one of these isolates was randomly selected for use in downstream transmission analyses to avoid conflating transmission cluster size. Where available, demographic information was obtained from computer-assisted self-interview records from Melbourne Sexual Health Centre (MSHC). Data collected included gender, age, site of infection, and sexual risk group (e.g. gay, bisexual, or other men who have sex with men (GBMSM), heterosexual males or females).Phenotypic testing and DNA extractionSingle colonies of presumptive N. gonorrhoeae were selected from the primary culture plate for identification and antimicrobial susceptibility testing. Isolates were confirmed as N. gonorrhoeae on the MALDI Biotyper (Bruker Daltonik, Bremen, Germany). Antimicrobial susceptibility testing was performed in accordance with the Australian Gonococcal Surveillance Programme (AGSP) using agar breakpoint dilution on GC agar for azithromycin, ceftriaxone, ciprofloxacin, penicillin, spectinomycin, and tetracycline. Resistance to antimicrobial agents was defined as in Supplementary Table 1. Genomic DNA was extracted from a single colony using a QIAsymphony™ DSP DNA Mini Kit (Qiagen) according to manufacturer’s instructions, and sequence libraries were prepared using the NexteraXT DNA library preparation kit with accompanying indexes and 300 cycle chemistry. WGS was performed on an Illumina NextSeq 500 or 550 instrument with 150 bp paired-end reads using Illumina libraries and protocols (Illumina, San Diego, CA, USA). Reads were trimmed to remove adaptor sequences and low-quality bases (Q  <  10) with Trimmomatic (v0.38)41.Quality control and genome assemblyAll sequences underwent quality control before further analysis. Sequences were aligned to the NCCP11945 reference genome (GenBank accession NC_011035.1) using Minimap2 (v2.17-r941) and Samtools (v1.10, using HTSlib v1.10.2) to calculate average depth of reads to the reference genome, using an inclusion criteria of ≥ 30x average depth42,43,44. Kraken2 (v2.1.1) was used to investigate for species contamination45. De novo genome assemblies were generated using Shovill (v1.1.0, https://github.com/tseemann/shovill) using default settings.Genotyping and genotypic antimicrobial resistance determinantsMLSTs were identified using mlst (v2.23.0; https://github.com/tseemann/mlst; PubMLST database accessed May 5, 2023) based on the alleles at seven housekeeping genes: abcZ, adk, aroE, fumC, gdh, pdhC, and pgm. Additionally, the NG-STAR typing scheme was used to determine the phenotypic profile of seven resistance genes (penA, mtrR, porB, ponA, gyrA, parC and 23S rRNA), using pyngSTar ((https://github.com/leosanbu/pyngSTar, database version 2.0). NG-STAR alleles and profiles obtained from the database hosted by the Public Health Agency of Canada (https://ngstar.canada.ca/alleles/loci_selection?lang=en)46. To characterise genotypic markers associated with AMR, we defined Neisseria gonorrhoeae Sequence Typing for Antimicrobial Resistance (NG-STAR) profiles for the dataset.cgMLST transmission clusteringThe PubMLST N. gonorrhoeae cgMLST scheme consisting of 1649 genes, was prepared using chewBBACA (v2.8.5) using the PrepExternalSchema module, after which 1648 genes were accepted26,47. Allele calling was performed on N. gonorrhoeae assemblies using chewBBACA, with a training file generated by Prodigal (v2.6.3) using the reference genome NCCP11945. A relevant subset of core genes for this dataset was then determined from the schema (see Results).A core genome multiple sequence alignment (cg-MSA) derived from cgMLST alleles for each isolate was constructed The gene sequences were retrieved based on allele number in the sequence definition database and individual gene sequences were aligned with MAFFT (v7.475), where missing alleles were converted into gaps, and the multiple sequence alignments were concatenated48. A 95% SNP core alignment was made from the cg-MSA using trimAl (v1.4.rev15)49. The resulting alignment consisted of 69,502 SNPs and was used to infer a maximum likelihood (ML) phylogenetic tree using IQ-tree (v2.0.3), with the best-fitting nucleotide substitution model chosen based on the lowest Bayesian Information Criterion (BIC)50. Molecular dating of ancestral events was performed using the least-squares dating (LSD) software v0.351.Similar to our previous approach of using case contacts to calibrate genomic thresholds for assessing transmission the maximum pairwise allelic difference7 for paired couples and within-site isolates was used as the threshold for subsequent single linkage hierarchical clustering when defining genomic clusters25. Before applying the threshold, we correct clustering for continuing N. gonorrhoeae evolution across our dataset, and the gap in our sampling timeframe between January 1st 2018 and June 30th 2019 by applying an inverse weighting to the pairwise allelic difference between any two isolates to account for time between sample collection according to the following formula52:$${Corrected}\; {difference}= {pairwise}\; {allelic}\; {difference} \\ \!-({rate}\; {of}\; {molecular}\; {dating}\; {regression} \\ \times {pairwise}\; {temporal}\; {distance})$$Bayesian hierarchical modelThe 3,714 N. gonorrhoeae genomes collected after 1st July 2019 were aligned to the NCCP11945 reference genome (GenBank accession NC_011035.1) using Snippy (v4.3.5, https://github.com/tseemann/snippy), requiring a minimum of ten supporting reads and a variant frequency of 0.9 or greater. Using the pseudo-sequence generated by Snippy for each isolate, whole genome alignments were generated for each of the previously defined cgMLST clusters with at least 5 isolates. For each of these, the number of constant sites was calculated using snp-sites (v1)53. Following recombination filtering using Gubbins (v2.4.1) with default settings for all clusters, and a SNP alignment was generated using snp-sites, and ML phylogenetic trees were inferred using IQ-tree (v2.0.3), with the best-fitting nucleotide substitution model chosen based on the lowest BIC and the number of constant sites specified54. Molecular dating of ancestral events was performed on the resulting ML trees, using the least-squares dating (LSD) software (v0.3) with a rate of 4.5 × 10−6 substitutions per site as previously defined and in line with other studies (Supplementary Fig. 15, Supplementary Table 3)55,56,57. The subsequent timed trees were used as input for a Bayesian hierarchical model.In our Bayesian analysis, we applied a birth-death skyline process to all trees. Under this model, the epidemiological process follows a birth-death process, where branching events in a phylogenetic tree are informative about transmission, and the sampling process is directly modelled. The epidemiological parameters can change in a piecewise fashion over two intervals, with the interval time estimated as part of the model58. Each transmission cluster had a fixed average infection duration of three months and shared a sampling proportion parameter, meaning that sampling intensity is a free parameter that is estimated here. However, each cluster was allowed to have independent effective reproductive numbers (Re) and epidemic origin times. The Re values were permitted to vary at a specific point in time, referred to as the “time slice,” which represented a significant point of change in Re values. Although Re values were independent for each cluster, they were governed by a single Gamma prior distribution with two hyperparameters: shape and mean. The shape of this distribution influenced the skew, with smaller values indicating more variation in Re among clusters, and larger values suggesting similarity in Re values across clusters.In our model, the Re values for clusters followed a Gamma distribution, where the shape parameter reflected heterogeneity in the spread among clusters. We maintained a fixed infection duration of three months in our hierarchical model, reflecting what we considered was the most plausible scenario. The birth-death model design resulted in different magnitudes of Re values, although the trends remained consistent. We assumed a uniform prior distribution for all origin times, ranging from zero to six months. Before the first genome of each cluster was sampled, we set the sampling proportion to 0.0, and thereafter, it was modelled with a Beta1,30 prior distribution to capture our assumption that the sampling proportion was at most 10%. The hyperparameters of the Gamma distribution for Re values were assigned Gamma1,10 priors. Additionally, the time slice allowing Re value changes was assigned a uniform prior distribution between March 20 and March 30, 2020.We sampled the posterior distribution using Markov chain Monte Carlo, implemented in BEAST2.6, with a chain length of 109 steps and sampling every 105 steps59. As the phylogenetic trees were fixed, there were no calculations of phylogenetic likelihood, making this analysis computationally more efficient compared to those involving both phylogenetic and phylodynamic likelihoods60.Statistical analyses and data visualisationLarge genomic clusters ( ≥ 30) were characterised as “persistent” if the time between the earliest and latest sample collection dates was > 2 years (104 weeks), bridging the gap in sampling from 1st January 2018 to 30st June 2019. To assess variables associated with cluster persistence, we applied a multivariable logistic regression model via generalised estimating equation (GEE) to calculate adjusted odds ratios using an independence model with geepack (v1.3.9). The following specified variables were included in the models: age group, sex, size of transmission cluster, phenotypic resistance to penicillin, phenotypic resistance to tetracycline, phenotypic resistance to ciprofloxacin, phenotypic resistance or decreased susceptibility to ceftriaxone and phenotypic resistance to azithromycin. For sex, an ‘unknown’ category was included to accommodate missing data, with all other categories having complete data. For antimicrobial resistance phenotype, isolates were grouped binarily as either resistant or susceptible/less susceptible/decreased susceptibility except for ceftriaxone where there were no resistant persistent isolates and isolates were grouped as either decreased susceptibility or susceptible. The Mann–Whitney Rank sum test was used to compare non-normal distributions. All statistical analyses were performed in Rstudio (v2022.12.0 + 353) using base R (v4.2.2). All figures were made using ggplot2 (v3.4.0)61.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles