Accurate long-read transcript discovery and quantification at single-cell, pseudo-bulk and bulk resolution with Isosceles

Isosceles Splice-graphsSplice-graph compatibility is defined for reads using various stringency levels to match their concordance with existing knowledge. Reads are classified based on compatibility as Annotated Paths (AP), Path Compatible (PC), Edge Compatible (EC), Node Compatible (NC), De-novo Node (DN), Artifact Fusion (AF), Artifact Splice (AS), and Artifact Other (AX). AP refers to full-length transcript paths that perfectly match a reference transcript from the input gene annotation and are quantified by default. PC reads follow transcript paths that are a traversal of an AP, and may be truncated or full-length or with differing transcript start or end positions. EC reads traverse annotated splice-graph edges (introns) and may be truncated or full-length. NC reads are paths that traverse only annotated splice-graph nodes (splice-sites) but contain at least one novel edge. DN reads have paths that traverse a de novo node (splice-site). AF reads traverse paths connecting at least two splice-graphs for annotated genes that do not share introns with each other. AS reads are assigned to genes, but traverse an unknown and irreproducible node (splice-site), while AX reads lack compatibility due to ambiguous strand or lack of gene assignment.Reads are also classified based on their truncation status, which includes Full-Length (FL), 5′ Truncation (5 T), 3′ Truncation (3 T), Full-Truncation (FT), and Not Applicable (NA). AP transcripts are automatically annotated as FL, and truncation status is checked only for PC, EC, NC, and DN transcripts. AF, AS, and AX transcripts are automatically labeled NA. Reference transcripts used for truncation status classification are recommended to be filtered to only the GENCODE ‘basic’ dataset (tag=’basic’), but also could be all transcripts in the provided annotations, as decided by the user. Full-length reads are those whose paths splice from a first exon (sharing a reference transcripts first 5′ splice site) and whose paths splice to a last exon (sharing a reference transcripts final 3′ splice site).To add nodes with one or more de novo splice sites to the splice-graph, each splice-site must meet two conditions: it is observed in at least the minimum number of reads (default: 2) and it is connected to a known splice site in the splice-graph with least a minimum fraction (default: 0.1) of that known splice site’s connectivity. Additionally, annotations for known transcripts and genes are merged and extended based on specific criteria. For example, any annotated genes sharing introns with each other are merged into one gene and given a new gene_id & gene_symbol (comma-separated list of original Ensembl IDs and gene symbols). Annotated spliced (and unspliced) transcripts sharing the same intron structure, as well as transcript start and end bins (default bin size: 50 bp) are merged together and given a unique transcript identifier.The method offers three modes of extending annotations to include de novo transcripts: strict, de_novo_strict, and de_novo_loose. In the strict mode, only AP transcripts are detected/quantified. In the de_novo_strict mode, AP transcripts and filtered FL transcripts of the EC and NC classes are included in quantification. In the de_novo_loose mode, AP transcripts and filtered FL transcripts of the EC, NC, and DN classes can be included.For downstream analysis of individual transcript features, AS events are defined as the set of non-overlapping exonic intervals that differ between transcripts of the same gene. These are quantified as percent-spliced-in or counts-spliced-in according to the sum of the relative expression or the raw counts of the transcripts that include the exonic interval respectively. AS events are classified into different types similar to previous methods analyzing splicing from short-read data2, including core exon intervals (CE), alternative donor splice sites (A5), alternative acceptor splice sites (A3), and retained introns (RI). Isosceles can also quantify tandem untranslated regions in the first or last exons including transcription start sites (TSS) and alternative polyadenylation sites (TES).Isosceles quantificationWe use the Expectation-Maximization (EM) algorithm to obtain the maximum likelihood estimate (MLE) of transcript abundances, as used previously in transcript quantification methods for short-read data such as our prior software Whippet2, or the approach’s conceptual precursors RSEM17 and/or Kallisto18. Specifically, we quantify transcript compatibility counts (TCCs) based on fully contained overlap of reads to the spliced transcript genomic intervals (including an extension [default: 100 bp] for transcript starts/ends), with strand for unspliced reads ignored by default. For computational efficiency, TCCs matching more than one gene are disallowed in the current version. The likelihood function is defined for transcript estimation as it is defined for short-read data with Whippet2, as L(α) proportional to the product, over all reads, of the sum of the probabilities α(t) of selecting a read from each compatible transcript t, divided by the effective length of t. However, due to the long length of nanopore reads, we define effective transcript length here to be the maximum of the mean read length or the transcript’s actual length, then divided by the mean read length. This directly accommodates shorter transcripts which would be fully spanned by the average read and are thus assigned an effective length of 1.0, whereas longer transcripts are represented proportionally to that value. In contrast, the user defined parameter specifying single-cell data does not use length normalization due to the anchoring of reads to the 5′ or 3′ ends of transcripts which assumes read coverage irrespective of transcript length. The EM algorithm iteratively optimizes the accuracy of transcript abundance estimates derived from TCCs, continuing until the absolute difference between transcript fractions is less than a given threshold (default:0.01) between iterations, or until the maximum number of iterations is reached (default: 250).Simulating ONT dataIn this study, the Ensembl 90 genome annotation (only transcripts with the GENCODE ‘basic’ tag) was used for all simulations, focusing specifically on spliced transcripts of protein-coding genes to exclude single-isoform non-coding genes. In order to simulate data with realistic transcriptional profiles, we quantified the expression of reference annotations in IGROV-1 cells using publicly available short-read data ([sample, project] accession IDs: [SRR8615844, PRJNA523380]; https://www.ebi.ac.uk/ena/browser/view/SRR8615844) and Whippet v1.7.3 using default settings. Only transcripts with non-zero expression in IGROV-1 were retained for simulations. For detection benchmarks, the Ensembl 90 annotation file (in Gene Transfer Format [GTF]) was randomly downsampled such that the longest transcript of each gene was always retained to ensure at least one full-length major isoform for each gene (by 10%, 20%, and 30% downsampling, where 99.8-100.0% of downsampled transcripts had unique exon-intron architectures). To assess performance in de novo transcript detection, each program was run individually (if de novo detection was supported) and in tandem with StringTie2 for a pre-detection step (also including IsoQuant for pre-detection with Isosceles). IsoQuant was executed in a similar manner to other programs but in a consecutive two-step process (where the first IsoQuant run identifies de novo transcripts which are concatenated to the original annotations in a second run) instead of a single-run due to significant improvement in performance observed (Fig. 2b-c and Supplementary Fig. 2 for IsoQuant two-step results; IsoQuant single-run results in Isosceles_Paper: reports_static/simulated_bulk_benchmarks_isoquant.ipynb).In order to simulate Oxford Nanopore Technologies (ONT) reads using NanoSim, we trained error models on bulk nanopore RNA-Seq FASTQ files concatenated from sequencing three cell lines: SK-OV-3 (SRR26865806), COV504 (SRR26865804), and IGROV-1 (SRR26865803). Nanopore single-cell RNA-Seq (nanopore scRNA-Seq) read models were also generated from the pooled set of the aforementioned cell lines (SRR26865982). A total of 100 million reads were simulated from each error model and then the first 12 million reads deemed alignable by NanoSim were extracted.Read model error rates: Bulk RNA-SeqscRNA-SeqMismatch rate0.029826872410708720.02866079657952386Insertion rate0.0240566036319089340.024409736896819117Deletion rate0.046542043349153490.030249793440489687Total error rate0.100425519391771150.08332032691683267To align the simulated reads provided in BAM format to all benchmark programs, Minimap2 was employed, using Ensembl 90 introns given in a BED file and applying a junction bonus parameter of 15 (with the exception of NanoCount, which required read alignment directly to the transcriptome). For the scRNA-Seq ONT dataset used to create the read model, various tools detected a similar number of cells (~2460), but the median number of unique molecular identifiers (UMIs) per cell differed. The Sicelore preprocessing of ONT scRNA-seq, identified between 3,000 and 6,000 UMIs per cell, which were provided in BAM format for biologically derived data benchmarks to Sicelore, IsoQuant, and Isosceles with cell barcode and UMI tags annotated (Fig. 3a-b). In contrast, FLAMES, with its own UMI detection and deduplication processes, detected around 13,500 UMIs per cell. To strike a balance between the varying results from different tools, a compromise of 10,000 reads per cell was chosen for this study.To simulate scRNA-Seq ONT data, a BAM file containing aligned simulated reads from the scRNA-Seq read model was randomly downsampled 100 times using samtools, with a subsampling proportion of 0.000833. This resulted in approximately 10,000 reads out of the original 12 million for each BAM file. A custom Python script (see supplemental Benchmark commands) was used to assign unique cell barcode sequences and UMI sequences for each read within the 100 BAM files. These subsampled BAM files were then merged and sorted using samtools.Synthetic molecules and platform-comparison data processingThe data used for comparative analysis of results from different sequencing platforms included FASTQ files for PacBio (ENCODE: ENCFF450VAU) and ONT (cDNA Pass basecalls from the Nanopore WGS Consortium GitHub repository: https://github.com/nanopore-wgs-consortium/NA12878/blob/master/RNA.md23), as well as Illumina short read transcript quantifications (ENCODE: ENCFF485OUK) for the GM12878 cell line. Long reads were aligned to the reference genome using Minimap2 as discussed previously for simulated data (although in the PacBio dataset, the ‘-ax splice:hq’ parameter was used instead of ‘-ax splice’). Transcripts with >1 TPM in Illumina quantifications (intersected with the Ensembl 90 transcript IDs utilized in this study to account for annotation discrepancies with Ensembl 95 annotation from ENCFF485OUK) were selected, and for those with one-to-many matches of Ensembl IDs, ground-truth values were aggregated.The alignment file in BAM format for ‘Nanopore cDNA Pass’ reads aligned to the SIRV sequences (SIRV set 3, Lot No. 001485) was downloaded from the Nanopore WGS Consortium GitHub repository (see above). The three top performing tools Isosceles, Bambu and IsoQuant were benchmarked on both insufficient annotations (44 annotated SIRV isoforms [24 withheld], compared to 68 isoforms in the correct annotations) and over-annotations (68 annotated SIRV isoforms with an additional 32 decoy isoforms) obtained from Lexogen’s website (https://www.lexogen.com/sirvs/download). For the over-annotations, the fraction of reads assigned to correct transcripts (read assignment precision) was calculated for each tool (utilizing both SIRV transcripts and 92 unspliced ERCC sequences). In case of insufficient annotations, transcript detection (comprising both annotated and withheld) was measured with the precision, recall, and F1 score metrics on spliced (SIRV) data only, with the metrics being calculated on the level of unique transcript splicing structures.Nanopore raw read files in FASTQ format were obtained from SRA for Sequin mix A data (SRR14286054) and mix B data (SRR14286063), then aligned using Minimap2 and processed using individual tools. Sequin reference sequences and annotations used for the analysis were downloaded from (https://github.com/XueyiDong/LongReadRNA/tree/master/sequins/annotations) as described previously22,38. Quantifications from each tool were compared to ground-truth Sequin expression values for mix A and mix B in order to calculate Spearman correlations and mean relative differences for each mix as well as for concatenated expression values from both mixes.Biological data processingThe bulk RNA-Seq data (GSE248114) included Promethion data, featuring eight sequencing libraries for seven ovarian cancer cell lines (OVMANA, OVKATE, OVTOKO, SK-OV-3, COV362, COV504, and IGROV-1), as well as two technical replicates for IGROV-1. For MinION platform data, two technical replicates for IGROV-1 were sequenced. Factors such as RAM performance and program speed determined the number of reads simulated in bulk simulations and downsampled in bulk data. For example, for performing cross platform correlations, the Promethion data was downsampled to 5 million reads to make it more comparable to MinION (~6-7 million raw reads) and pseudo-bulk scRNA-Seq (3.5-4.5 million UMIs per cluster, as detected by Isosceles) in terms of total read depth. This decision was also influenced by an issue with IsoQuant (https://github.com/ablab/IsoQuant/issues/69), which limited its ability to process large read files in our hands. Notably, this issue persisted on a cluster node with 20 CPUs of 2.4 GHz and allocated 230 GB of RAM.The scRNA-Seq data (GSE248115) consisted of a mix of three cell lines (SK-OV-3, COV504, and IGROV-1). The Illumina sequencing (SRR26865983) was preprocessed using CellRanger (Version 6.0.1). For ONT sequencing data (SRR26865982) we considered two barcode preprocessing methods (Sicelore and wf-single-cell) for cell barcode (CBC) and unique molecular identifier (UMI) detection. We observe similar average Spearman correlation (0.85 vs 0.88) and mean relative diff. (0.57 vs 0.60) between the same cell lines in pseudo-bulk and bulk between the two. However, better performance was achieved with Sicelore preprocessing for matched vs. decoy (0.26 vs 0.16 for Spearman correlation, 0.22 vs 0.14 for mean relative diff.). Therefore, we used Sicelore preprocessing to annotate the CBC and UMI tags in the ONT sequencing BAM files for Isosceles, Sicelore, and IsoQuant (Supplementary Fig. 5d; see below).Mitochondrial transcripts common to all method’s output were removed, as they were strong outliers across methods. Additionally, three specific transcripts outliers across methods were removed: ENST00000445125 (18 S ribosomal pseudogene), ENST00000536684 (MT-RNR2 like 8), and ENST00000600213 (MT-RNR2 like 12).Benchmarks using biological dataThe correlation and relative difference analyses (Supplementary Fig. 4c) compared annotated transcripts between bulk RNA-Seq data from two Promethion and two MinION sequencing replicates of IGROV-1, both within each platform (using replicates) and between platforms (using averaged data for each platform). For each comparison, only transcripts with a mean expression of at least 1 TPM were used. In Supplementary Fig. 4d, scRNA-Seq and bulk RNA-Seq data were also compared, again considering only annotated transcripts. For each program, the IGROV-1 scRNA-Seq pseudo-bulk cluster (according to genetic identity from Souporcell) was compared with the averaged bulk RNA-Seq IGROV-1 expression values from two replicates for each platform. Analyses were also restricted to transcripts with an expression of at least 1 TPM in the single-cell RNA-Seq results. Comparisons were made for each platform using top k cells (highest UMI count) using the top 5000 transcripts (highest mean expression) to ensure a comparable number of transcripts across software package, and top N transcripts (highest mean expression) for 64 top cells (highest UMI count) (Supplementary Fig. 4d).For Fig. 3a-c, scRNA-Seq and bulk RNA-Seq data analysis was conducted using Bioconductor packages (eg. scran, scater, etc.) on the transcript level for cells with at least 500 genes, for a range of top highly variable transcript numbers (500, 1,000, 2,000, 4,000, 6,000 and 10,000), as determined by the scran::getTopHVGs function39. Heatmaps were generated to show correlations and mean relative difference between scRNA-Seq pseudo-bulk results for three cell line clusters and Promethion bulk RNA-Seq results for seven ovarian cancer cell lines, similarly only including annotated transcripts. IGROV-1 expression was averaged from two replicates. To compare difference between matched and decoy metrics (Spearman correlation and mean relative difference), we calculated the absolute difference and computed the upper and lower bounds of the standard error using error propagation (as sqrt(se(x)^2 + se(y)^2)). To assess the overall significance of Isosceles results compared to each program in matched versus decoy metrics, we computed the differences between each matched cell line and the mean of decoys across a range of 500-10,000 HVTs. The set of differences is then compared against the matched results from Isosceles using a Wilcoxon matched-pairs signed-rank test.For the simulated data version of Fig. 3c presented in Supplementary Fig. 5c, nanopore reads were simulated for SK-OV-3, IGROV-1, OVMANA, OVKATE, OVTOKO and COV362. These were based on short read TPM values obtained from Whippet v1.7.3 and Ensembl 90 transcripts with the GENCODE ‘basic’ tag (excluding mitochondrial transcripts) and mean expression of at least 0.1 TPM across all analyzed cell lines. 5 M reads were produced by NanoSim for both bulk RNA-Seq and scRNA-Seq read models, which were aligned to the genome using Minimap2. For the latter, cell barcodes randomly selected from 100 sequences and unique UMI sequences were added to the BAM files. Simulated bulk RNA-Seq and scRNA-Seq samples were analyzed as described for biological data presented in Fig. 3c.We also perform this benchmark for Isosceles on the BAM files obtained from Sicelore and wf-single-cell (for the latter, Minimap2 alignment junction bonus of 15 was specified using the ‘resources_mm2_flags’ flag and the expected number of cells was set to 2,000). As wf-single-cell doesn’t produce a deduplicated BAM file, UMI deduplication using UMI-Tools was applied. Isosceles results for both BAM files were compared for the top 4,000 highly variable transcripts, defining the choice of Sicelore for single-cell barcode preprocessing used in the manuscript (see Supplementary Fig. 5d).Case-study analysis of biological dataFor the case-study in Fig. 4, the raw reads were pre-processed to identify cell barcodes (CBC) and unique molecular identifiers (UMI) according to the Sicelore workflow. The reads were subsequently aligned to the reference genome mm10/GRCm38 (with annotations derived from GENCODE M25), using Minimap2 with a junction bonus of 15, which targeted both annotated introns from Gencode M25 and those extracted from the VastDB mm10 GTF file30. The aligned reads with CBC and UMI annotations were subsequently quantified with Isosceles. The 951-cell dataset was filtered to exclude cells that expressed fewer than 100 genes. For dimensionality reduction, we combine Isosceles gene and transcript counts, culminating in the total identification of 3760 variable features (with a target of 4000), comprising 1735 genes and 2025 transcripts. We applied Principal Component Analysis (PCA), calculating 30 components using the scaled expression of the variable features. Cells were clustered using Louvain clustering (with resolution parameter of 2) on the Shared Nearest Neighbor (SNN) graph (setting a k-value of 10). The clusters’ identities were determined through gene set scores, particularly the mean TPM values of markers delineated in the original study (see Supplementary Fig. 6). Additional marker genes were identified via the scran::findMarkers function requiring the t-test FDR to be significant (q value < 0.05) in at least half of the comparisons to other clusters (selecting top 5 markers of each cluster).Pseudotime analysis was performed using Slingshot for differentiating glutamatergic neurons (identifying two trajectories, T1 and T2), differentiating GABAergic neurons, radial glia, cycling radial glia and Cajal-Retzius cells (with one trajectory each). To implement the original ‘isoform switching’ analysis, pairs of clusters were compared, detecting marker transcripts through the specific scran::findMarkers function (Wilcoxon test). We filter for transcripts of the same gene showing statistically significant differences in opposite directions (i.e. one upregulated in one cluster, the other in another cluster). To analyze splicing changes within each trajectory, we used Isosceles to calculate aggregated TCC values for windows along pseudotime, defining the window size as 30 cells and the step size as 15 cells. AS events from variable transcripts abiding by further criteria were selected for downstream analysis. First, mean PSI values across all cells from the trajectory were between 0.025 and lower than 0.975 to exclude constitutively included/excluded events. Second, at least 30 cells must have values not equal to 0, 1, or 0.5, and 30 cells must have a value above 0.1 to select against events with only low counts. Redundant PSI events, identical in read counts profiles within a trajectory, were excluded, and those with >0.99 spearman correlation were excluded from visualization in Fig. 4b and Supplementary Fig. 7b. For comparative analysis, percent-spliced-in (PSI) count values are denoted as counts-spliced-in (CSI) and defined by PSI * gene counts. These are juxtaposed with exclusion PSI counts, calculated as [(1 – PSI value) * gene counts] and the inclusion/exclusion pair input into DEXSeq29. For each intra-trajectory comparison, our experimental design encompassed ‘~sample + exon + pseudotime:exon‘. Meanwhile, the inter-trajectory analysis included all trajectories with a design of ‘~sample + exon + pseudotime:exon + trajectory:exon‘, compared against a null model of ‘~sample + exon‘ using the LRT test.To determine ratios of observed vs. expected CSI, we shuffle TCCs across cells with non-zero counts and apply the EM algorithm, calculating PSI for each window. To obtain expected CSI we multiply the shuffled PSI values * observed gene counts. The permutations are conducted for each AS event across 100 bootstraps. For empirical statistical validation of changes between the first and last windows of a trajectory (eg. for Celf2), we fit a negative binomial distribution to each window using maximum likelihood estimation (‘fitdistrplus‘ package) on the permuted CSI, and calculate high and low one-tailed p values for the observed CSI. Combining the high and low, and low and high p values of the first and last windows respectively using Fisher’s method, we defined an overall p-value as two times the minimum combined p value. Specifically for heatmap visualization, a broad window size of 100 cells for glutamatergic & GABAergic neurons, and 50 cells for glia and CR cells, with a consistent step size of 3 cells for smoothing was utilized. The heatmap values were given as the log2 ratio of observed to expected, with a pseudocount of 0.1, defining the ratio between PSI counts and the average of the corresponding permuted PSI counts.Benchmark command summary:https://github.com/Genentech/Isosceles_Paper/blob/devel/Benchmark_commands.mdSoftware versions:SoftwareVersionIsoscelesv0.2.0Flairv1.7.0StringTie2v2.2.1IsoQuantv3.0.3NanoCountv1.0.0.post6Sicelorev2.0Bambuv3.2.5 (R 4.3.0, Bioconductor 3.17)FLAMESv0.1ESPRESSObeta1.3.0NanoSimv3.1.0Minimap2v2.24-r1122wf-single-cellv1.1.0UMI-toolsv1.1.5Cell cultureAll cell lines used in this study were validated by STR analysis and verified mycoplasma negative by PCR. No commonly misidentified cell lines were used in this study. IGROV1, SK-OV-3, OVTOKO, OVKATE and OVMANA cell lines were cultured in RPMI-1640 supplemented with 10% heat-inactivated fetal bovine serum (FBS) and 2mM L-Glutamine. COV362 and COV504 cells were cultured in DMEM supplemented with 10% FBS and 2mM L-Glutamine. Cells were cultured in 37 °C and 5% CO2 in a humidified incubator. Cell line source and catalog numbers are provided in the table below. Cells were cultured in 10cm2 plates until they reached ~60-80% confluency. For bulk analysis, RNA was purified using Qiagen’s RNeasy Plus Mini kit (Cat. #74134) according to the manufacturer’s instructions. For single-cell analysis, IGROV1, SK-OV-3 and COV504 cells were trypsinized and pooled together at a 1:1:1 ratio at a concentration of 1000 cells / μl and submitted for single cell long read sequencing.Cell lineProviderCatalog numberIGROV-1NCI DCTD SK-OV-3ATCCHTB-77OVTOKOJCRB Cell BankJCRB1048OVKATEJCRB Cell BankJCRB1044OVMANAJCRB Cell BankJCRB1045COV362ECACC07071910 Lot# 07G029COV504ECACC07071902 Lot# 07I007Single-cell, long-read library preparation and nanopore sequencingApproximately 10 ng of cDNA generated from the Next GEM Single Cell 3′ Gene expression kit (10X Genomics, Cat # PN-100268) was amplified using 10uM of the biotinylated version of the forward primer and a reverse primer from the single cell 3′ transcriptomics protocol (ONT, SQK-LSK114), [Btn]_Fwd_3580_partial_read1_defined_for_3′_cDNA, 5′-/Biosg/CAGCACTTGCCTGTCGCTCTATCTTC CTACACGACGCTCTTCCGATCT-3′ and Rev_PR2_partial_TSO_defined_for_3′_cDNA, 5′-CAGCTTTCTGTTGGTGCTGATATTGCAAGCAGTGGTA TCAACGCAGAG-3′. To ensure enough cDNA was generated for the pull-down reaction (200 ng), two PCR reactions were carried out using 2X LongAmp Taq (NEB, Cat # M0287S) with the following PCR parameters 94°C for 3 minutes, with 5 cycles of 94°C 30 secs, 60°C 15 secs, and 65°C for 3 mins, with a final extension of 65°C for 5 minutes. The cDNA was pooled and cleaned up with 0.8X SPRI bead ratio and eluted in 40 μL RNAse free H20. Concentration was evaluated using the QuBit HS dsDNA assay (Thermofisher, Cat No. Q32851). The amplified cDNA was then captured using 15 μL M270 streptavidin beads (Thermofisher, Cat # 65305). Beads were washed three times with 1 mL of the 1X SSPE buffer (150 mM NaCl, 10 mM NaH2PO4, and 1 mM EDTA). Beads were then resuspended in 10 μL of 5X SSPE buffer (750 mM NaCl, 50 mM NaH2PO4, and 5 mM EDTA). Approximately 200 ng of the cDNA in 40 μL were added together with the 10 μL M270 beads and incubated at room temperature for 15 minutes. After incubation, the sample and beads were washed twice with 1 mL of 1X SSPE. A final wash was performed with 200 uL of 10 mM Tris-HCl (pH 8.0) and the beads bound to the sample were resuspended 10 μL of RNAse free H2O. PCR was performed on-bead using the unbiotinylated version of the primers from the ONT single 3’ transcriptomics protocol discussed earlier for 5 cycles according to the same PCR program shown above. A 0.8X SPRI was performed. The cDNA was eluted in 50 μL in RNAse free H2O and the concentration was evaluated with QuBit HS dsDNA assay and Tapestation D5000 DNA kit (Agilent Technologies, Cat # 5067-5589).Library preparation for nanopore sequencing was performed according to the SQK-LSK110 protocol with the exception of the end-repair step time which was increased to 30 min. 125 fmol of final library was loaded on the PromethION using the FLO-PRO002 flow cells, R9.4.1 chemistry and sequenced for 72 h. Reads were basecalled using Guppy v5.0.11.Statistics and reproducibilityNo statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles