Comprehensive evaluation and prediction of editing outcomes for near-PAMless adenine and cytosine base editors

Generation of near-PAMless base editorsTo leverage the expanded PAM compatibility offered by SpRY, our study utilized the BE4max-SpRY and ABEmax-SpRY, which incorporate SpRY in place of SpCas9, as previously described by Walton et al.29 (Fig. 1a, b). Consistent with their results29, both BE4max-SpRY and ABEmax-SpRY succeeded in editing sites with non-NGG PAMs (Fig. 1c–h). As a control, the AncBE4max-NGG, which was the commonly used base editor with improved precision at the time of our experiments43, seldom edited non-NGG sites (Fig. 1c–h).Fig. 1: Generation of near-PAMless base editors.Schematics of the near-PAMless CBEs (a) and ABEs (b) generated in this study. (c–h) Base editing frequencies were evaluated at six endogenous genomic sites (sites 1–3 for CBEs, sites 4–6 for ABEs) in HEK293T cells. The target DNA sequence of each site is shown above histograms, with the protospacer sequence (positions 1-20), edited base (red), and PAM sequence (blue). The target base is indicated in brackets. Independent experiments were performed in triplicates. (c–e) CBE editing frequencies. (f–h) ABE editing frequencies. The x-axis indicates the position of cytosine or adenine, and the y-axis shows mean editing frequencies ± standard error (SEM). Statistical significance was assessed using an ANOVA followed by Dunnett’s test relative to BE4max-SpRY (c–e) or ABEmax-SpRY (f–h). *P < 0.05, **P < 0.01, ***P < 0.001.To optimize the editing efficacy and precision of these PAM-flexible base editors, we introduced several modifications. We substituted the deaminase domain in BE4max-SpRY or ABEmax-SpRY with YE1 or TadA-8e, respectively. Informed by Tan et al.19, for CBEs, we utilized a shortened P(AP)3 linker (hereafter referred to as SL), expecting it to maintain high editing efficiency and narrow the editing window. The No Linker (hereafter referred to as NL) CBE variants were reported to have negligible efficiency19, so we did not use it for SpRY version of CBEs. For ABE8e, known for its high efficiency and broad editing window, we applied both SL and NL to investigate which configuration would reduce bystander editing while maintaining high editing efficiency. A BE3-flag-tagged nuclear localization signal (FNLS), which is designed to increase the nuclear expression of base editors16,44, was also used to replace the original nuclear localization signal (NLS). These modifications yielded four additional near-PAMless CBEs and ABEs each. We then evaluated these CBEs and ABEs in HEK293T cells at three genomic sites containing multiple Cs (Sites 1-3) and three sites containing multiple As (Sites 4–6), respectively (Fig. 1c–h; Supplementary Table 1).In the CBE variants, YE1 substitution in BE4max-SpRY led to a reduction in bystander editing at the eighth C for site 2, where the sixth cytosine base was targeted (Fig. 1d). Similarly, at site 3 with the seventh base as the target, YE1-SpRY showed decreased editing frequencies at the fifth, ninth, and tenth Cs compared with BE4max-SpRY (Fig. 1e). When YE1-SpRY was paired with a short linker (SL or P(AP)3), it exhibited the lowest editing frequencies among all near-PAMless base editors across sites 1–3 (Fig. 1c–e). Notably, the integration of SL with BE4max-SpRY did not reduce editing frequencies at the non-target cytosines—the fourth, fifth, and seventh—when targeting the sixth cytosine at site 1 (Fig. 1c). These results suggest that YE1 integration into near-PAMless CBEs not only maintains target base editing efficiency but also enhances specificity by minimizing unintended edits.For the adenine base editors, TadA-8e deaminase variants consistently displayed enhanced editing frequencies across all the adenines in the tested sites, suggesting an expansion of the editing window compared to ABEmax-SpRY (Fig. 1f–h). The ABE8e-SL-SpRY and ABE8e-NL-SpRY variants, contain a shortened or absent XTEN linker respectively, exhibited reduced bystander editing frequencies at adenine position 10 A in site 4, 8 A in site 5, and 10 A in site 6 compared with ABE8e-SpRY (Fig. 1f–h). Replacing the original NLS with a FNLS sequence did not notably alter editing efficiencies of either CBE or ABE variants (Fig. 1c–h). Based on these results, we selected two near-PAMless CBEs (FNLS-YE1-SpRY and YE1-SpRY) with improved editing precision and two near-PAMless ABEs (ABE8e-SL-SpRY and ABE8e-NL-SpRY) with increased editing efficiency for a more comprehensive evaluation.Systematic evaluation of near-PAMless base editors using a large-scale sgRNA-target libraryTo systematically evaluate the performance of these near-PAMless base editors, we constructed a paired sgRNA-target library containing 45,747 sequences (Supplementary Table 2). Each sgRNA-target pair consists of a 20 nt sgRNA and its corresponding target DNA sequence, plus a 4 bp PAM sequence, enabling analysis of editing efficiency and outcomes by sequencing the target sequence. This library was designed to include 24,050 randomly generated sgRNA-target pairs with NANN or NGNN PAMs29, with a balanced representation of sequence contexts for mapping sequence determinants of editing efficiency; 1,023 pairs with 256 types of NNNN PAMs for evaluating PAM preferences; 20,541 sequences associated with mutations reported in the ClinVar database with their corresponding endogenous PAMs, and 133 endogenous loci with non-NGG PAMs from previous reports29 (Fig. 2a). Given that SpRY variants exhibit higher activity at sequences with NRN PAMs compared to NYN PAMs29, we designed the random sequences to be enriched with ten NRN PAMs known for higher activities for SpRY. For library construction, synthesized sgRNA-target pairs were PCR amplified and assembled into a lentiviral plasmid by Gibson assembly45,46.Fig. 2: High-throughput evaluation of PAM compatibility and editing activities of near-PAMless base editors.a Composition of the paired sgRNA-target library containing random library (n = 24,050), ClinVar library (n = 20,541), PAM library (n = 1023), endogenous loci (n = 133). Each sgRNA-target sequence comprises a 20 nt sgRNA spacer, its matching 20 nt target sequence, and a 4 nt PAM sequence. An editable C or A is positioned within positions 4–8 for the random library and at position 6 for the ClinVar library. b Workflow for high-throughput measurement of editing efficiency. HEK293T cells were transduced with the lentiviral packaged sgRNA-target library and transfected with base editors. Genomic DNA was extracted from GFP+ cells and sequenced. Editing outcomes were determined by analyzing the sequence changes in the target sequence for each sgRNA. Editing efficiencies of CBEs (c) and ABEs (d) were grouped by different PAM sequences. The boxes represent the 25th, 50th, and 75th percentiles; whiskers indicate the 10th and 90th percentiles. e–f Comparison of editing efficiencies for near-PAMless base editors and NG-specific base editors. Statistical significance was determined by t-test for independent samples. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.We packaged the sgRNA-target library into lentivirus and transduced the constructs into HEK293T cells. The cells were split into 10 pools, and each pool was transfected with a different base editor, with two independent replicate experiments performed (Fig. 2b). 72 h post-transfection, we sequenced the sgRNA-target cassettes to evaluate the editing efficiency and outcomes. A total of 35,769 to 37,005 sgRNA-target pairs were recovered with sequencing reads exceeding 100 in different experiments (Supplementary Table 3). High correlation of editing rates was observed between replicates (Supplementary Fig. 1a, b), with Pearson’s correlation coefficients ranging from 0.80 to 0.84 for CBEs and 0.74 to 0.85 for ABEs. Further validation at 38 endogenous sites in HEK293T cells also revealed the reliability of our library data, with strong correlations observed between editing efficiencies at integrated-target sequences and those at the endogenous sites (Supplementary Fig. 1c, d; Supplementary Table 4, 5; Pearson’s correlation 0.82 and 0.93, respectively).Dependence of editing efficiency on PAM sequences in near-PAMless base editorsWe first compared editing efficiencies for NGG, NG, and near-PAMless base editors on sequences with different PAMs. For sequences with NGGN PAMs at positions 21–24, the median C-to-T editing efficiency at positions 4-8 was 11.5% for AncBE4max-NGG, 7.63% for AncBE4max-NG, 11.44% for BE4max-SpRY, 8.42% for FNLS-YE1-SpRY, and 10.1% for YE1-SpRY (Fig. 2c). AncBE4max-NGG and BE4max-SpRY showed comparable efficiencies on sequences with NGGN PAMs. On NGNN PAM-containing sequences, while AncBE4max-NGG had greatly reduced editing efficiencies, the other four base editors maintained similar efficiencies as on NGGN PAM sites. The three SpRY versions of CBEs could also efficiently edit sequences containing NANN PAMs. Similarly, the SpRY version of ABEs showed expanded PAM compatibility (Fig. 2d). In addition, the ABE8e variant increased editing efficiency approximately by 3.7-fold compared to ABEmax-NGG on NGGN PAM sites (Fig. 2d).We further evaluated the PAM preferences across 256 distinct PAM sequences to reveal differences among various CBEs and ABEs. We found that base editors containing SpRY had higher editing efficiency than NG or NGG PAM base editors across a diverse array of PAM sequences (Supplementary Fig. 2a, b). The first and the fourth base of the PAM resulted in variations in the editing efficiency for the base editor at the same sgRNA (Supplementary Fig. 2a, b). To assess the adaptability of near-PAMless base editors to different PAM sequences, we categorized 256 distinct PAM sequences into 16 types of PAM motifs based on variations of the second and third bases. Among the CBEs, FNLS-YE1-SpRY showed higher editing efficiencies than AncBE4max-NGG in 13 types of NXXN PAM motifs and higher efficiencies than AncBE4max-NG in 6 kinds of NXXN PAM motifs (Fig. 2e and Supplementary Fig. 2c). Similarly, the SpRY-integrated ABEs exhibited expanded PAM compatibility compared to their non-SpRY counterparts. ABEmax-SpRY exhibited higher editing efficiency than ABEmax-NGG and ABEmax-NG across all PAM motifs, except for NGGN and NG(A/C)N, respectively (Fig. 2f and Supplementary Fig. 2d). Taken together, these results suggest that SpRY version of base editors display broad PAM compatibilities, and the variations in the PAM sequences have a significant impact on the editing activities of base editors.Editing outcomes are differentially affected by target sequence contexts for near-PAMless base editorsTo evaluate the precision of base editing, we next compared the distribution of editing activities across the protospacer in target sequences. The mean editing rate peaked at position 5 or 6 for all CBEs and ABEs (Fig. 3a, b). We define the editing window as the positions edited at a rate exceeding 50% of the target site. Compared with BE4max-SpRY, the editing window of FNLS-YE1-SpRY and YE1-SpRY narrowed from positions 4–8 to positions 5–7 (Fig. 3a). For SpRY-integrated ABEs, Tad-8e with a rigid linker displayed an editing window of positions 3–8 in the target sequence, whereas ABE8e-NL-SpRY confined it mainly to positions 3–7 (Fig. 3b).Fig. 3: Impact of target sequence context on editing outcomes of near-PAMless base editors.Editing frequencies of near-PAMless CBEs (a) and ABEs (b) across positions within protospacer. Bars and error bars show mean ± SEM of editing frequencies. Positions with average editing frequencies above 50% of the maximum are in red. From left to right: BE4max-SpRY, FNLS-YE1-SpRY, YE1-SpRY in (a); ABEmax-SpRY, ABE8e-SL-SpRY, ABE8e-NL-SpRY in (b). Mean editing frequencies across positions within protospacer of CBEs (c) and ABEs (d) with different preceding bases relative to the target cytosine or adenine. Bars and error bars show mean ± SEM of editing frequencies.Furthermore, we compared the bystander editing of SpRY-integrated base editors with NGG PAM-specific base editors, using sequences containing NGGN PAMs only. We calculated the relative editing efficiency at positions 4-8 compared to the position with the highest editing (Supplementary Fig. 3a, b). We found a lowered bystander editing activity for YE1-SpRY compared to AncBE4max-NGG across most two C or three C patterns, except when three Cs occupied positions 4, 6, and 7 or two Cs at position 5 and 6 (Supplementary Fig. 3a). When sequences contained consecutive adenines within the editing window, the highest editing efficiency was typically observed at the first adenine. An exception occurred for As at position 456, where the fourth adenine lies at the edge of the editing window, did not exhibit this trend (Supplementary Fig. 3b).Subsequently, we sought to compare the sequence determinants in the target sequence that impact editing outcomes. It was observed that the base preceding the target significantly affects editing efficiency, with different deaminases showing preferences for different preceding bases35,36. In line with this, we found that a T base preceding the target C resulted in significantly higher mean editing rate compared to other bases, whereas a preceding G correlated with the lowest rate (Fig. 3c and Supplementary Fig. 3c). By setting a cutoff of 5% editing frequency, we found that a preceding T enabled editing from positions 2-11 for BE4max-SpRY, while only positions 5-7 were editable with a preceding G (Supplementary Fig. 3c). FNLS-YE1-SpRY showed editing activity within positions 4 to 8, even with a preceding T. (Supplementary Fig. 3c). For all ABEs, a preceding A was associated with the lowest editing frequencies, while a preceding T consistently resulted in the highest editing frequencies (Fig. 3d and Supplementary Fig. 3d). Further analysis of the local 3-bp context around the target base revealed a preference for “TCN” sequence in CBEs and a tendency towards “TAY” (Y = C or T) context for ABEmax and “TAS” (S = C or G) contexts for ABE8e variants. (Supplementary Fig. 4a-f).These observations underscore the complex relationship between sequence context and editing outcomes, which varies significantly among different base editors. Indeed, the proportion of target outcome varied markedly between different base editors, especially between Tad-8e-containing and Tad-7.10-containing near-PAMless base editors, where the Pearson’s correlation coefficient ranged only 0.04 to 0.23 (Supplementary Fig. 5a). As a result, the predictive models trained on one type of base editors may not be universally applicable to others. When predicting base editing outcomes using BE-Hive35, a model trained on data from BE4max or ABEmax, we observed Pearson’s correlation coefficients in editing proportion ranging from 0.62 to 0.71. However, other deaminase-altered and linker-altered base editors showed correlations between 0.34 to 0.51 (Supplementary Fig. 5b, c). Therefore, the development of a new model is necessary to accurately capture the sequence determinants and PAM compatibility for precise prediction of the near-PAMless base editors.Developing BEguider for predicting base editing outcomes with near-PAMless editorsTo comprehensively capture sequence-activity relationships of the near-PAMless base editors, we developed a deep learning model named BEguider (Fig. 4a; Methods, Tables 1–2). Each BEguider model comprises of two modules, one for predicting editing efficiency and the other for predicting editing outcome proportions. The two modules share the same architecture, except for their last layers. For each module, BEguider consisted of two subpaths. On one path, the one-hot encoded sgRNA sequence data are fed into a Convolutional Neural Network (CNN), which is capable for capturing local sequence features. Concurrently, the data are passed into a second path consisting of an embedding layer followed by a Bidirectional Long Short-Term Memory (Bi-LSTM) layer. This path is designed to capture the global dependencies in the sequence data, leveraging Bi-LSTM’s ability to learn complex, bidirectional dependencies and handle long-range relationships effectively. The outputs of both the CNN and Bi-LSTM are then concatenated, merging local and global features into a unified representation. This design was aimed at extracting both local and global determinants in sgRNA sequences, and a stacking strategy was employed for accurate prediction by leveraging the strengths of CNNs and Bi-LSTMs in a complementary way.Fig. 4: Prediction of base editing efficiencies and outcomes using BEguider.a The architecture of BEguider consists of a CNN module for extracting local sequence features and a Bi-LSTM module for capturing global sequence patterns. The two modules are stacked to enable integrated learning of deaminase and PAM compatibility determinants. b Correlation between predicted and measured editing efficiencies for near-PAMless CBEs. c Correlation between predicted and measured editing proportions for near-PAMless CBEs. d Correlation between predicted and measured editing efficiencies for near-PAMless ABEs. e Correlation between predicted and measured editing proportions for near-PAMless ABEs. The color of each dot in (b–e) represents the predicted editing efficiency or proportion, respectively. R: Spearman’s correlation. r: Pearson’s correlation. f–j Correlation between predicted and measured editing frequencies by different models for ABEs (f, g) and CBEs (h–j). Numbers in (f) and (h) represent Spearman’s correlations and those in (g) and (j) are Pearson’s correlations. k Scatterplots illustrating the correlation between predicted and observed per-base editing rates in HepG2 cells. Each point represents the Z-scored per-base editing rate of adenine bases edited by ABE8e-SpRY across three experimental replicates for 49 sgRNA species. The number of tested sites (N = 221) and the Spearman correlation coefficients (R) for each comparison are provided in the respective panels.We used 20-nucleotide target sequences and 4-nucleotide PAM sequences as the input data for the model. We trained a unique BEguider model for each base editor using the high-throughput base editing data we have generated (Supplementary Tables 6–11). The data for each base editor were split into training and test datasets with a ratio of 9:1. There were more than two thousand unused target sequences in every test dataset to evaluate BEguider’s performance. We found that BEguider models could precisely predict editing efficiencies (Fig. 4b, Spearman’s correlation 0.76-0.80, Pearson’s correlation 0.75–0.77) and editing outcome proportions (Fig. 4c, Pearson’s correlation 0.82–0.87) for near-PAMless CBEs. Similarly, BEguider models also showed good predictive performance for ABEs (Fig. 4d, e), with Pearson’s correlation between 0.71 to 0.72 for editing efficiencies and 0.77 to 0.89 for editing proportions.We then tested BEguider’s performance on other experimental datasets. Currently, the only reported large datasets available for SpRY version of base editors were in Kim et al.40. They have provided high-throughput sgRNA-target editing results for 5623 sgRNAs with SpRY-ABE8e(V106W) and 750 sgRNAs with SpRY-YE1-BE4max. We first predicted editing efficiencies for positions 4–8 using BEguider models trained on our ABE8e-SL-SpRY and ABE-NL-SpRY data. The overall Spearman’s correlation between BEguider-ABE8e-SL-predicted and BEguider-ABE8e-NL-predicted editing frequency and measured data from ABE8e(V106W) were 0.74 and 0.71 (Fig. 4f, g and Supplementary Fig. 6a), respectively. Kim et al. have developed DeepBE40, a deep learning model that takes the deaminase and PAM sequence into consideration separately. The overall Spearman correlation coefficients of DeepBE predicted datasets with our measured editing frequency for ABE8e-SL-SpRY and ABE8e-NL-SpRY were 0.54 and 0.57 (Fig. 4f, g and Supplementary Fig. 6a), respectively. This indicates our models have good generalizability. We found that, in ABEs, the preceding A base before the target site showed the lowest prediction accuracy (Supplementary Fig. 6c, d), potentially due to the lowered editing rate for sites following the A base.For CBEs, both models exhibited moderate generalizability to other datasets (Fig. 4h, j and Supplementary Fig. 6b). For per-position editing rates with different preceding bases, the preceding G base before the target site showed the lowest prediction accuracy, worst at position 8 with preceding G (Supplementary Fig. 6e, f). Therefore, more high-quality training data should facilitate generating more accurate prediction models.To further validate the performance and generalizability of BEguider, we compared its predictions with experimental data from a different cell line. Using editing efficiencies measured in HepG2 cells with ABE8e-SpRY for 221 adenine bases47, we found that our ABE8e-NL-SpRY model’s predictions showed strong correlations with both endogenous editing rates (Spearman’s ρ = 0.62) and integrated-target-site editing rates (Spearman’s ρ = 0.64). Notably, these correlations approach the experimental correlation (ρ = 0.60) between integrated-target-site and endogenous editing rates reported by Ryu et al., underscoring the robust performance of our model across different cellular contexts (Fig. 4k and Supplementary Table 12). In summary, our model shows excellent predictive performance, as evidenced by the good correlation between predicted and experimental datasets.Assessing the potential of near-PAMless base editors for targeting pathogenic variants using BEguiderAn important application of near-PAMless base editors is for disease modeling and correction of pathogenic SNVs. By analyzing the ClinVar database, we identified 47,485 pathogenic or likely pathogenic SNVs that correspond to C-to-T or A-to-G conversions. Considering the possibility to design sgRNAs, we found that 40,485 of these SNVs are correctable, and 39,997 are generatable by near-PAMless base editors. In comparison, only 7.6% and 8.4% of these variants can be corrected or generated with NGG base editors (Fig. 5a). Notably, 69.8% of the identified C-to-T SNVs and 57.0% of the A-to-G SNVs contain more than one editable base within the editing window (Fig. 5b), underlining the necessity for precise prediction of editing outcomes for near-PAMless base editors.Fig. 5: Prediction of editing outcomes for ClinVar pathogenic variants using BEguider.a PAM distribution at ClinVar sites. Left: designed SNV correction sites. Right: designed disease modeling sites. b The number of target bases (1–4) within editing window at ClinVar sites, with the left for C-to-T conversions and the right for A-to-G conversions. c The Venn diagram showing the overlap of sgRNAs with predicted precisely editable SNV correction sites and disease modeling sites in different near-PAMless CBEs and ABEs. d The number of sgRNAs with predicted precisely editable SNV correction sites and disease modeling sites, detailed across different genes and diseases. The proportions in brackets represent the percentage of predicted editable sgRNAs relative to the total number of editable genes or diseases. Distribution of predicted editing proportions and efficiencies for near-PAMless CBEs (e) and ABEs (f) when the editing window contains different numbers of target bases. Dashed lines indicate thresholds of 90% for the proportion and 5% for the efficiency.To identify those variants that could be precisely corrected or generated, we used our computational model, BEguider, to predict the editing outcomes. We defined SNVs as “precisely editable” if they achieve a predicted desired editing outcome proportion exceeding 90% with an editing efficiency above 5%. Under this criterion, we found that near-PAMless BEs could precisely correct 14,540 pathogenic or likely pathogenic SNVs and precisely generate 17,983 SNVs for disease modeling (Fig. 5c; Supplementary Tables 13–16). The precisely editable sites are associated with 2385 genes and 4386 diseases for C-to-T near-PAMless base editors, and 1782 genes and 2538 diseases for A-to-G editors. For disease modeling, precisely editable sites span variants across 1202 genes and 1548 diseases for C-to-T, and 2844 genes and 6063 diseases for A-to-G (Fig. 5d).Figure 5e and f showed the results of predicted proportion and efficiency when editing windows contain different numbers of target bases. Notably, when two to three target bases were present within 4–8 bp at the editing window, certain SNVs remained editable with high precision. For instance, with two Cs in the window, 902, 1648, and 1318 SNVs can be precisely edited by BE4max-SpRY, FNLS-YE1-SpRY, and YE1-SpRY, respectively (Supplementary Fig. 7a, b). These results indicate that, in order to achieve desired editing outcomes, we can select the optimal base editors based on BEguider-predicted outcome proportion and editing efficiency.Generation of ClinVar SNVs for disease modeling and SNV correction using near-PAMless base editorsTo provide guidance for future studies of pathogenic variants using near-PAMless base editors, we examined editing outcomes in our high-throughput dataset for 10,175 sites for pathogenic SNV correction and 10,366 pathogenic SNV sites for disease modeling (Supplementary Fig. 8a, b; Supplementary Tables 17–20). These SNVs, when positioned as the sixth base in the sequence context, lack an NGG PAM sequence, rendering them inaccessible to conventional Cas9 version of base editors. We found that YE1-SpRY precisely corrected 443 pathogenic or likely pathogenic SNVs and precisely generated 1596 SNVs for disease modeling. Similarly, ABE8e-NL-SpRY could precisely corrected 2632 pathogenic or likely pathogenic SNVs and precisely generated 872 SNVs for disease modeling (Fig. 6a). In comparison, NG PAM-specific CBE and ABE precisely edited only 227 and 730 SNVs for correction, and precisely generated 764 and 221 sites for disease modeling, respectively. The precisely editable sites are associated with 432 genes and 449 diseases for near-PAMless CBEs, and 1469 genes and 2055 diseases for near-PAMless ABEs. For disease modeling, precisely editable sites span variants across 1124 genes and 1380 diseases for C-to-T editing, and 708 genes and 751 diseases for A-to-G editing, representing a 2 to 4.5-fold increase compared to the outcomes generated by NG PAM-specific base editors (Fig. 6b). For instance, in congenital muscular dystrophy, the previously uneditable c.3283 C > T (p.Arg1095Ter) variant in LAMA2 was edited at 36.2% frequency in the genome of HEK293T cells by YE1-SpRY (Fig. 6c). Similarly, for the TP53 c.695 T > C (p.Ile232Thr) variant, which is inaccessible for NGG-PAM base editors, were generated by ABE8e-NL-SpRY with 82.8% editing frequency (Fig. 6d). We next compared the bystander editing outcomes for different BEs on these pathogenic sites. We analyzed the mean editing frequency of each base in the editing window and edited proportion of the sixth base for near-PAMless CBEs and ABEs, with different combinations of editable positions in the window. BE4max-SpRY exhibited a slight leftward shift in proportion of edited sixth base (Fig. 6e), indicating relatively higher bystander effects compared to YE1-integrated PAMless CBEs. Consequently, YE1-SpRY demonstrated the highest precision in editing sites containing multiple Cs. Specifically, it achieved over 90% editing proportion at the 6th position for 321 sites when Cs were at positions 4 and 6, and 144 sites when Cs were at 6 and 8, and 38 sites for Cs at 4,6,8. (Fig. 6f). For ABEs, ABE8e-NL-SpRY owns 68.7% editable sgRNAs at the 6th position across 2832 sites where adenines were targeted, and it edited 424 sites where As were present at both positions 6 and 8. ABE8e-NL-SpRY outperformed ABE8e-SL-SpRY in editing efficiency at the sixth target base while minimizing bystander effects (Fig. 6g).Fig. 6: Measured outcomes at 20,541 ClinVar sites by near-PAMless CBEs and ABEs.a The number of precisely editable sites in near-PAMless CBEs and ABEs compared to their NG-PAM counterparts in our experimental data. b The number of sgRNAs that can precisely edit SNVs for correction or disease modeling, and the associated genes and diseases. Editing frequencies of YE1-SpRY (c) and ABE8e-NL-SpRY (d) at endogenous genomic loci in HEK293T cells. e Heatmaps illustrating editing frequencies and line charts showing editing proportion of edits at the sixth position for different types of base combinations. The number and percentage of editable sites for near-PAMless CBEs (f) and ABEs (g) grouped by different cytosine or adenine base combinations within the editing window.In summary, we have generated an extensive dataset of experimentally measured editing outcomes for 20,541 ClinVar variants using near-PAMless base editors. This resource is now accessible through http://beguider.bmicc.org/, a website that also offers our prediction model, in an interactive online format (Supplementary Fig. 9). The website is designed to facilitate the use of near PAMless base editors. Users can input a gene name with the target sequence or chromosome position, select a BE, and BEguider will generate optimized sgRNA sequences for use with near-PAMless base editors, along with detailed predictions about editing efficiency, editing outcomes, and their proportions. Additionally, users can input a pathogenic variant from ClinVar, choose a base editor type, and specify whether they aim to correct or generate the variant. BEguider then provides the designed sgRNA along with the editing efficiency and detailed outcome predictions, indicating the potential for precise correction or generation of the variant related to diseases.

Hot Topics

Related Articles