A systematic search for RNA structural switches across the human transcriptome

SwitchFinder: detailed description of the algorithmConflicting base pairs identificationConflicting base pairs were detected using a modification of the MIBP algorithm developed by L. Lin and W. McKerrow59. First, a large number of folds (default N = 1,000) is sampled from the Boltzmann distribution. If structure probing data (such as DMS-seq or SHAPE-seq) is provided, the Boltzmann distribution modeling software (part of the RNAstructure package56) incorporates the data as a pseudofree energy change term. Then, the base pairs are filtered: the base pairs that are present in almost all of the folds or are absent from almost all of the folds are removed from the further analysis. Then, mutual information for each pair of base pairs is estimated. To do so, each base pair is represented as a binary vector of length N, where N is the number of folds considered; in this binary vector, a given fold is represented as 1 if this base pair is present there, or as 0 if it is not. Mutual information between each two base pairs is calculated as in ref. 60. This results in an M × M table of mutual information values, where M is the number of base pairs considered. Then, the sum of each row of the square table is calculated. In the resulting vector K of length M, each base pair is represented by a sum of mutual information values across all of the other base pairs. Then, only the base pairs for which the sum of mutual information values passes the threshold of U × MAX(K) are considered, where U is a parameter (default value 0.5). We call the base pairs that pass this threshold the ‘conflicting base pairs’.Conflicting stems identificationsOnce the conflicting base pairs are identified, they are assembled into conflicting stems, or series of conflicting base pairs that directly follow each other and therefore could potentially form a stem-like RNA structure. More specifically, the base pairs (a, b) and (c, d) form a stem if either (a == c − 1) and (b == d + 1), or (a == c + 1) and (b == d − 1). The stem is defined as a pair of intervals ((u, v), (x, y)), where v − u == y − x. Then, the conflicting stems are filtered by length: only the stems that are longer than a certain threshold value (default value: 3) are considered. Among these stems, the stems that directly conflict with each other are identified. Two stems ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)) conflict with each other if there is an overlap longer than a threshold value between either (u1, v1) and (u2, v2), or (u1, v1) and (x2, y2), or (x1, y1) and (u2, v2), or (x1, y1) and (x2, y2). The default threshold value is 3. The pairs of conflicting stems are sorted by the average value of their K values (sums of mutual information). The highest scoring pair of conflicting stems is considered the winning prediction, representing the major switch between two of the local minima present in the energy folding landscape of the given sequence. If no pairs of conflicting stems pass the threshold, SwitchFinder reports that no potential switch is identified for the given sequence.Identifying the two conflicting structuresGiven the prediction of the two conflicting stems, the folds that represent the two local minima of the energy folding landscape are predicted. Importantly, SwitchFinder focuses on optimizing the prediction accuracy, as opposed to the commonly used approach of energy minimization61. The MaxExpect program from the RNAstructure package56 is used; the base pairings of each of the conflicting stems are provided as folding constraints (in Connectivity Table format). Furthermore, the two predicted structures are referred to as conformations 1 and 2.Activation barrier estimationThe RNApathfinder software62 is used to estimate the activation energy needed for a transition between the conformations 1 and 2.Classifier for prediction of RNA switchesThe curated representative alignments for each of the 50 known riboswitch families were downloaded from the Rfam database9. Each sequence is complemented by its shuffled counterpart (while preserving dinucleotide frequencies63). For all of the sequences, the two conflicting conformations, their folding energies and their activation energies are predicted as above. To estimate the performance of SwitchFinder for a given riboswitch family, all of the sequences from this family are placed into the test set, while all of the sequences from the other families are placed into the training set. Then, a linear regression model is trained on the training set, in which the response variable is binary and indicates whether the sequence is a real riboswitch or is a shuffled counterpart, and the predictor variables are the average folding energy of the two conformations and the activation energy of the transition between them. The trained linear regression model is then run on the test set, and its performance is estimated using the receiver operating characteristic curve.Prediction of RNA switches in human transcriptomeThe coordinates of 3ʹUTRs of the human transcriptome were downloaded from UCSC Table Browser64, table tb_wgEncodeGencodeBasicV28lift37. The sequences of 3ʹUTRs were cut into overlapping fragments of 186 nucleotides in length (with overlaps of 93 nucleotides). For all of the sequences, the two conflicting conformations, their folding energies and their activation energies were predicted as above. A linear regression model was trained as described above on all 50 known riboswitch families. The model was applied to the 3ʹUTR fragments from the human genome, and the fragments were sorted according to the model prediction scores. The top 3,750 predictions were selected for further investigation.Incorporation of in vivo probing dataIn vivo probing data, such as DMS-MaPseq, is used to apply pseudoenergy restraints when sampling folds from the Boltzmann distribution (that is, using the –SHAPE parameter in RNAstructure package commands56). To test the hypothesis of whether the in vivo probing data support the presence of two conflicting conformations in a given sequence, the following workflow was used. First, the two conflicting folds were predicted with SwitchFinder using in silico folding only. Then, SwitchFinder was run on the same sequence with the inclusion of in vivo probing data. If the same two conflicting folds were predicted among the top conflicting folds, the probing data were considered supportive of the presence of the two predicted conformations.Mutation generationTo shift the RNA conformation ensemble towards one or another state, mutations of two types were introduced.

(1)

‘Strengthen a stem’ mutations: given two conflicting stems ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), one of the stems (for example, the first one) was changed in a way that would preserve its base pairing but deny the possibility of forming the second stem. To do so, the nucleotides in the interval (u1, v1) were replaced with all possible sequences of equal length, and the nucleotides (x1, y1) were replaced with the reverse complement sequence. Then, the newly generated sequences were filtered by two predetermined criteria: (i) the second stem cannot form more than a fraction of its original base pairs (default value 0.6), and (ii) the modified first stem cannot form long paired stems with any region of the existing sequence (default threshold length 4). The sequences that passed both criteria were ranked by the introduced change in the sequence nucleotide composition; the mutations that changed the nucleotide composition the least were chosen for further analysis. Each mutated sequence was additionally analyzed by SwitchFinder to ensure that the Boltzmann distribution is heavily shifted towards the desired conformation.

(2)

‘Weaken a stem’ mutations: given two conflicting stems ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), one of the stems (for example, the second one) was changed in such a way that this stem would not be able to form base pairing, while the base pairing of the other stem (in this example, the first stem) would be preserved. To do so, the nucleotides in either of the intervals (u2, v2) or (x2, y2) were replaced with all possible sequences of equal length. The newly generated sequences were filtered by three predetermined criteria: (i) the first stem stays unchanged, (ii) the second stem cannot form more than a fraction of its original base pairs (default value 0.6), and (iii) the modified part of the sequence cannot form long paired stems with any region of the existing sequence (default threshold length 4). The sequences that passed all of the criteria were ranked by the introduced change in the sequence nucleotide composition: the mutations that changed the nucleotide composition the least were chosen for further analysis. Each mutated sequence was additionally analyzed using SwitchFinder to ensure that the Boltzmann distribution is heavily shifted towards the desired conformation.

Cell cultureAll cells were cultured in a 37 °C 5% CO2 humidified incubator. The HEK293 cells (purchased from ATCC, cat. no. CRL-3216) were cultured in DMEM high-glucose medium supplemented with 10% FBS, l-glutamine (4 mM), sodium pyruvate (1 mM), penicillin (100 units ml−1), streptomycin (100 μg ml−1) and amphotericin B (1 μg ml−1) (Gibco). The Jurkat cell line (purchased from ATCC, cat. no. TIB-152) was cultured in RPMI-1640 medium supplemented with 10% FBS, glucose (2 g l−1), l-glutamine (2 mM), 25 mM HEPES, penicillin (100 units ml−1), streptomycin (100 μg ml−1) and amphotericin B (1 μg ml−1) (Gibco). All cell lines were routinely screened for mycoplasma with a PCR-based assay.Cryo-electron microscopySample preparation and data collectionA total of 3.5 µl target mRNA at an approximate concentration of 1.5 mg ml−1 was applied to gold, 300 mesh transmission electron microscopy grids with a holey carbon substrate of 1.2 µm and 1.3 µm spacing (Quantifoil). The grids were blotted with no. 4 filter papers (Whatman) and plunge frozen in liquid ethane using a Mark IV Vitrobot (Thermo Fisher), with blot times of 4–6 s, blot force of −2, at a temperature of 8 °C and 100% humidity. All grids were glow discharged in an easiGlo (Pelco) with rarefied air for 30 s at 15 mA, no more than 1 h prior to preparation. Duplicate wild-type and mutant RNA specimens were imaged under different conditions on several microscopes as per Data File S8; all were equipped with K3 direct electron detector (DED) cameras (Gatan), and all data collection was performed using SerialEM65. Detailed data collection parameters are listed in Data File S8.Image processingDose-weighted and motion-corrected sums were generated from raw DED movies during data collection using University of California, San Francisco (UCSF) MotionCor266. Images from super-resolution datasets were downsampled to the physical pixel size before further processing. Estimation of the contrast transfer function (CTF) was performed in CTFFIND467, followed by neural net-based particle picking in EMAN268. Two-dimensional (2D) classification, ab initio three-dimensional (3D) classification, and gold-standard refinement were done in cryoSPARC69. CTFs were then re-estimated in cryoSPARC and particles repicked using low-resolution (20 Å) templates generated from chosen 3D classes. Extended datasets were pooled when appropriate, and particle processing was repeated through gold-standard refinement as before. All structure figures were created using UCSF ChimeraX (ref. 70). Further details are given in Data File S7 and Extended Data Fig. 5.Reporter vector design and library cloningFirst, mCherry-P2A-Puro fusion was cloned into the BTV arbovirus backbone (Addgene, cat. no. 84771). Then, the vector was digested with MluI-HF and PacI restriction enzymes (NEB), with the addition of Shrimp Alkaline Phosphatase (NEB). The digested vector was purified with the Zymo DNA Clean and Concentrator-5 kit.DNA oligonucleotide libraries (one for functional screen and one for massively parallel mutagenesis analysis) consisting of 7,500 sequences in total were synthesized by Agilent. The second strand was synthesized using Klenow Fragment (3ʹ → 5ʹ exo-) (NEB). The double-stranded DNA library was digested with MluI-HF and PacI restriction enzymes (NEB) and run on a 6% TBE (Tris base, boric acid, EDTA) polyacrylamide gel. The band of the corresponding size was cut out and the gel was dissolved in the DNA extraction buffer (10 mM Tris, pH 8, 300 mM NaCl, 1 mM EDTA). The DNA was precipitated with isopropanol. The digested DNA library and the digested vector were ligated with T4 DNA ligase (NEB). The ligation reaction was precipitated with isopropanol and transformed into MegaX DH10B T1R electrocompetent cells (Thermo Fisher). The library was purified with ZymoPURE II Plasmid Maxiprep Kit (Zymo). The representation of individual sequences in the library was verified by sequencing the resulting library on an MiSeq instrument (Illumina).Massively parallel reporter assayThe DNA library was co-transfected with pCMV-dR8.91 and pMD2.G plasmids using TransIT-Lenti (Mirus) into HEK293 cells, following the manufacturer’s protocol. Virus was collected 48 h after transfection and passed through a 0.45 µm filter. HEK293 cells were then transduced overnight with the filtered virus in the presence of 8 µg ml−1 polybrene (Millipore); the amount of virus used was optimized to ensure an infection rate of ~20%, as determined by flow cytometry The infected cells were selected with 2 µg ml−1 puromycin (Gibco). Cells were collected at 90–95% confluency for sorting and analysis on a BD FACSaria II sorter. The distribution of mCherry : GFP ratios was calculated. For sorting a library into subpopulations, we gated the population into eight bins each containing 12.5% of the total number of cells. A total of 1.2 million cells were collected for each bin to ensure sufficient representation of sequence in the population in two replicates each. For each subpopulation, we extracted genomic DNA and total RNA with the Quick-DNA/RNA Miniprep kit. gDNA was amplified by PCR with Phusion polymerase (NEB) using the primers CAAGCAGAAGACGGCATACGAGAT–i7– GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCACTGCTAGCTAGATGACTAAACGCG and AATGATACGGCGACCACCGAGATCTACAC–i5– ACACTCTTTCCCTACACGACGCTCTTCCGATCTGTGGTCTGGATCCACCGGTCC. Different i7 indexes were used for eight different bins, and different i5 indexes were used for the two replicates. RNA was reverse transcribed with Maxima H Minus Reverse Transcriptase (Thermo Fisher) using primer CTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNTGGTCTGGATCCACCGGTCCGG. The complementary DNA was amplified with Q5 polymerase (NEB) using primers CAAGCAGAAGACGGCATACGAGAT–i7–GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCCTGCTAGCTAGATGACTAAACGC and CAAGCAGAAGACGGCATACGAGAT–i5–GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTACCCGTCATTGGCTGTCCA. Different i7 indexes were used for eight different bins, and different i5 indexes were used for the two replicates. The amplified DNA libraries were size purified with the Select-a-Size DNA Clean and Concentrator MagBead Kit (Zymo). Deep sequencing was performed using the HiSeq4000 platform (Illumina) at the UCSF Center for Advanced Technologies.The adapter sequences were removed using cutadapt71. For RNA libraries, the unique molecular identifier (UMI) was then removed from the reads and appended to read names using UMI tools72. The reads were matched to the fragments using the bwa mem command. The reads were counted using featureCounts73. The read counts were normalized using median of ratios normalization74. The one-way chi-squared test was used to estimate how different its distribution across the sorting bins is from the null hypothesis (that is uniform distribution). mRNA stability was estimated by comparing the RNA and DNA read counts with MPRAnalyze75.Massively parallel mutagenesis analysisLibrary design and measurementFor each candidate switch, two alternative conformations were identified using SwitchFinder. Each conformation is defined by a stem structure: ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), representing two conflicting stems. The SwitchFinder mutation generation algorithm was used to design four mutations in the candidate switch sequence: A, ‘strengthen a stem’ mutation favoring conformation 1: the regions (u1, v1) and (x1, y1) are altered while preserving complementarity; B, ‘weaken a stem’ mutation favoring conformation 1: either the region (u2, v2) or (x2, y2) is modified, preserving the regions (u1, v1), (x1, y1); C, ‘strengthen a stem’ mutation favoring conformation 2: the regions (u2, v2), (x2, y2) are changed while maintaining complementarity; and D, ‘weaken a stem’ mutation favoring conformation 2: either the region (u1, v1) or (x1, y1) is altered, ensuring that the regions (u2, v2), (x2, y2) remain intact.Subsequently, the mutated sequences for selecting candidate RNA switches, along with the reference sequence, were pooled into a single DNA oligonucleotide library. The impact of each sequence on reporter gene expression was evaluated in cells, as outlined in the Massively Parallel Reporter Assay section. Consequently, each candidate RNA switch in the library is represented by its reference sequence, two mutated sequences favoring conformation 1 (A and B), and two mutated sequences favoring conformation 2 (C and D).Candidate RNA switch rankingFor each candidate RNA switch, its effect on reporter gene expression was assessed in cells, following the protocol described in the Massively Parallel Reporter Assay section. This resulted in 16 measurements, corresponding to normalized read counts in sorting bins 1 (lowest expression) to bin 8 (highest expression), across two replicates; these arrays of counts are referred to as ‘bin_counts’. Measurements were obtained for mutants A, B, C, D, and the reference sequence. Correlations between the effects of mutations designed to favor the same or opposite conformations were computed as follows: correlation_same_1 = Pearsonr(bin_counts(mutant A), bin_counts(mutant B)); correlation_same_2 = Pearsonr(bin_counts(mutant C), bin_counts(mutant D)); correlation_opposite_1 = Pearsonr(bin_counts(mutant A), bin_counts(mutant C)); and correlation_opposite_2 = Pearsonr(bin_counts(mutant A), bin_counts(mutant D)). The score of each candidate switch was then calculated as: score = mean(correlation_same_1, correlation_same_2) − mean(correlation_opposite_1, correlation_opposite_2). Candidate switches were ranked based on this score. Those with a score exceeding the mean + 1 s.d. were considered significant.DMS-MaPseqDMS-MaPseq was performed as described in ref. 54. In brief, cells were incubated in culture with 1.5% DMS (Sigma) at room temperature for 7 min, the media was removed, and DMS was quenched with 30% BME (β-mercaptoethanol). Total RNA from DMS-treated cells and untreated cells was then isolated using Trizol (Invitrogen). RNA was reverse transcribed using TGIRT-III reverse transcriptase (InGex) and target-specific primers. PCR was then performed to amplify the desired sequences and to add Illumina-compatible adapters. The libraries were then sequenced on a HiSeq4000 instrument (Illumina).Pear (v0.9.6) was used to merge the paired reads into a single combined read. The UMI was then removed from the reads and appended to read names using UMI tools (v1.0). The reads were then reverse complemented (fastx toolkit) and mapped to the amplicon sequences using bwa mem (v0.7). The resulting bam files were then sorted and deduplicated (umi_tools, with method flag set to unique). The alignments were then parsed for mutations using the CTK (CLIP Tool Kit) software. The mutation frequency at every position was then reported. The signal normalization was performed using boxplot normalization76. The top 10% of positions with the highest mutation rates were considered outliers77. The clustering of DMS-MaPseq signal was performed with DRACO28.SHAPE chemical probing of RNAsChemical probing and mutate-and-map experiments were carried out as described previously78. In brief, 1.2 pmol RNA was denatured at 95 °C in 50 mM Na-HEPES, pH 8.0, for 3 min, and folded by cooling to room temperature over 20 min, and then adding MgCl2 to a 10 mM concentration. RNA was aliquoted in 15 µl volumes into a 96-well plate and mixed with nuclease-free H2O (control), or chemically modified in the presence of 5 mM 1-methyl-7-nitroisatoic anhydride (1M7)79, for 10 min at room temperature. Chemical modification was stopped by adding 9.75 µl quench and purification mix (1.53 M NaCl, 1.5 µl washed oligo-dT beads, Ambion), 6.4 nM FAM-labeled, reverse-transcriptase primer (/56-FAM/AAAAAAAAAAAAAAAAAAAAGTTGTTCTTGTTGTTTCTTT), and 2.55 M Na-MES. RNA in each well was purified by bead immobilization on a magnetic rack and two washes with 100 µl 70% ethanol. RNA was then resuspended in 2.5 µl nuclease-free water prior to reverse transcription.RNA was reverse transcribed from annealed fluorescent primer in a reaction containing 1× First Strand Buffer (Thermo Fisher), 5 mM dithiothreitol, 0.8 mM dNTP mix and 20 U SuperScript III Reverse Transcriptase (Thermo Fisher) at 48 °C for 30 min. RNA was hydrolyzed in the presence of 200 mM NaOH at 95 °C for 3 min, then placed on ice for 3 min and quenched with 1 volume 5 M NaCl, 1 volume 2 M HCl, and 1 volume 3 M sodium acetate. cDNA was purified on magnetic beads, then eluted by incubation for 20 min in 11 µl Formamide-ROX350 mix (1,000 µl Hi-Di Formamide (Thermo Fisher) and 8 µl ROX350 ladder (Thermo Fisher)). Samples were then transferred to a 96-well plate in ‘concentrated’ form (4 µl sample + 11 µl ROX mix) and ‘dilute’ form (1 µl sample + 14 µl ROX mix) for saturation correction in downstream analysis. Sample plates were sent to Elim Biopharmaceuticals for analysis by capillary electrophoresis.Antisense oligonucleotide infectionASOs were purchased from Integrated DNA Technologies; the Morpholino ASOs were purchased from Gene Tools LLC (see sequences in Data File S9). A total of 95,000 HEK cells were seeded into the wells of a 24-well cell culture-treated plate in a total volume of 500 µl. At 24 h later, either 1 nmol Morpholino ASO together with 3 µl EndoPorter reagent (Gene Tools LLC), or 6 pmol other ASO were added to each well. LNCaP, MCF-7 and LS174T cells were infected with ASOs using Lonza SE Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-1032) according to the manufacturer’s protocol. At 48 h later, the mCherry and eGFP fluorescence was measured on a BD FACSCelesta Cell Analyzer, or RNA was isolated for RT-qPCR measurement with the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment per the manufacturer’s protocol.CRISPRi screenReporter screens were conducted using established flow cytometry screen protocols80 (Horlbeck et al., 2016; Sidrauski et al., 2015). Jurkat cells with previously verified CRISPRi activity were used (Horlbeck et al., 2018). The CRISPRi-v2 (5 sgRNA/TSS, Addgene cat. no. 83969) sgRNA library was transduced into Jurkat cells at a multiplicity of infection of <0.3 (the percentage of blue fluorescent protein (BFP)-positive cells was ~30%). For the flow-based CRISPRi screen with the Jurkat cells, the sgRNA library virus was transfected at an average of 500-fold coverage after transduction (day 0). Puromycin (1 µg ml−1) selection for positively transduced cells was performed at 48 h (day 2) and 72 h (day 3) after transduction (day 3). On day 11, cells were collected in PBS and sorted with the BD FACSAria Fusion cell sorter. Cells were gated into the 25% of cells with the highest GFP : mCherry fluorescence intensity ratio, and the 25% of cells with the lowest ratio. The screens were performed with two conditions: cells with a reference RORC element–GFP reporter and a mutated 77-23 RORC element–GFP reporter. Screens were additionally performed in duplicate. After sorting, genomic DNA was collected (Macherey-Nagel Midi Prep kit) and amplified using NEB Next Ultra II Q5 master mix and primers containing TruSeq Indexes for next-generation sequencing. Sample libraries were prepared and sequenced on a HiSeq 4000. Guides were then quantified with the published ScreenProcessing (https://github.com/mhorlbeck/ScreenProcessing) method and phenotypes generated with an in-house processing pipeline, iAnalyzer (https://github.com/goodarzilab/iAnalyzer). In brief, iAnalyzer relies on fitting a generalized linear model to each gene. Coefficients from this generalized linear model were z-score normalized to the negative control guides and finally the largest coefficients were analyzed as potential hits. For the comparison of gene phenotypes between the two cell lines, the DESeq2 ratio of ratios test was used57.CRISPRi-mediated and small interfering RNA-mediated gene knockdownJurkat cells expressing the dCas9–KRAB fusion protein were constructed by lentiviral delivery of pMH0006 (Addgene, cat. no. 135448) and FACS isolation of BFP-positive cells.Guide RNA sequences for CRISPRi-mediated gene knockdown were cloned into pCRISPRia-v2 (Addgene, cat. no. 84832) via BstXI-BlpI sites. After transduction with sgRNA lentivirus, Jurkat cells were selected with 2 µg ml−1 puromycin (Gibco). The fluorescence of eGFP and of mCherry was measured on a BD FACSCelesta Cell Analyzer.For UPF1 siRNA-mediated knockdown, the TriFECTa DsiRNA Kit from Integrated DNA Technologies (cat. no. hs.Ri.UPF1.13) was used. LNCaP, MCF-7 and LS174T cells were infected with siRNAs using the Lonza SE Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-1032) according to the manufacturer’s protocol. At 48 h later, RNA was collected using the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment as per the manufacturer’s protocol.Reporter cell line generationMutated or reference sequences of RORC 3ʹUTR were cloned into the dual GFP–mCherry reporter using the MluI-HF and PacI restriction enzymes (NEB) as described above. The reporters were lentivirally delivered to HEK293 and Jurkat cells and analyzed with flow cytometry as described above.Drug treatmentJurkat cells were seeded at a density of 0.25 × 107 cells per ml. Either the proteasome inhibitors (Carfilzonib or Bortezomib, Cayman Chemical) or negative control (dimethyl sulfoxide, DMSO) were added at the given concentration. After 24 h of incubation, the fluorescence of eGFP and of mCherry was measured on a BD FACSCelesta Cell Analyzer.MCF-7 cells were treated either with 50 µM NMDI14 (TargetMol), or with DMSO, for 24 h. Afterwards, cells were treated with DMS as describe above and the RNA was collected as described above.mRNA stability measurementsJurkat cells were treated with 10 μg ml−1 α-amanitin (Sigma-Aldrich, cat. no. A2263) for 8–9 h prior to total RNA extractions. Total RNA was isolated using the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment as per the manufacturer’s protocol. mRNA levels were measured with RT-PCR, using 18S ribosomal RNA (transcribed by RNA Pol I) as the control.T-cell isolation, transduction and Th17 cell differentiationTh17 cells were derived as described previously34. Plates were coated with 2 µg ml−1 anti-human CD3 (UCSF monoclonal antibody core, clone: OKT-3) and 4 µg ml−1 anti-human CD28 (UCSF monoclonal antibody core, clone: 9.3) in PBS with calcium and magnesium for at least 2 h at 37 °C or overnight at 4 °C with the plate wrapped in parafilm. Human CD4+ T cells were isolated from human peripheral blood using the EasySep human CD4+ T cell isolation kit (17952; STEMCELL) and stimulated in ImmunoCult-XF T-cell expansion medium (10981; STEMCELL) supplemented with 10 mM HEPES, 2 mM l-glutamine, 100 µM 2-MOE, 1 mM sodium pyruvate and 10 ng ml−1 transforming growth factor-β. At 24 h after T-cell isolation and initial stimulation on a 96-well plate, 7 µl lentivirus was added to each sample. After 24 h, the media was removed from each sample without disturbing the cells and replaced with 200 µl fresh media. After 48 h, cells were stimulated with 1.2 µM ionomycin, 25 nM propidium monoazide and 6 µg ml−1 brefeldin-A, resuspended by pipetting, incubated for 4 h at 37 °C, and collected for analysis. Half of each sample was stained for CD4, FoxP3, interleukin (IL)-13, IL-17A, interferon (IFN)-γ and analyzed on a BD LSRFortessa cell analyzer (see below). The other half of the sample was not stained and was analyzed for the expression of eGFP and mCherry on a BD LSRFortessa cell analyzer.Cultured human T cells were collected, washed and stained with antibodies against cell surface proteins and transcription factors. Cells were fixed and permeabilized with the eBioscience Foxp3/Transcription Factor Staining Buffer Set or the Transcription Factor Buffer Set (BD Biosciences). Extracellular nonspecific binding was blocked with the anti-CD16/CD32 antibody (clone 2.4G2; UCSF Monoclonal Antibody Core). Intracellular nonspecific binding was blocked with anti-CD16/CD32 antibodies) and 2% normal rat serum. Dead cells were stained with Fixable Viability Dye eFluor 780 (eBioscience) or Zombie Violet Fixable Viability Kit (BioLegend). Cells were stained with the following fluorochrome-conjugated anti-human antibodies: anti-CD4 (Invitrogen, cat. no. 17-0049-42), anti-FOXP3 (eBioscience, cat. no. 25-4777-61), anti-IL-13 (eBioscience, cat. no. 11-7136-41), anti-IL-17A (eBioscience, cat. no. 12-7179-42) and anti-IFNγ (BioLegend, cat. no. 502520). All of the antibodies were used at 1:200 dilution. Samples were analyzed on a BD LSRFortessa cell analyzer. Data were analyzed using FlowJo 10.7.1 and BD FACSDiva v9 software.Analysis of capillary electrophoresis data with HiTRACECapillary electrophoresis runs from chemical probing and mutate-and-map experiments were analyzed with the HiTRACE MATLAB package81. Lanes were aligned, bands fitted to Gaussian peaks, background subtracted using the no-modification lane, corrected for signal attenuation, and normalized to the internal hairpin control. The end result of these steps is a numerical array of ‘reactivity’ values for each RNA nucleotide that can be used as weights in structure prediction.UPF1 targeted CLIP-seqJurkat cells expressing RORC reporters (reference, 77-GA mutant variant or 116-CCCTAAG mutant variant) were collected and crosslinked by ultraviolet radiation (400 mJ cm−2). Cells were then lysed with low salt wash buffer (1x PBS, 0.1% SDS, 0.5% sodium deoxycholate, 0.5% IGEPAL). To probe preferential UPF1 binding towards different reporters, lysates from 77-GA mutant cells were mixed with lysates from either wild-type or 116-CCCTAAG mutant cells at a 1:1 ratio prior to immunoprecipitation. Samples were then treated with a high dose (1:3,000 RNase A and 1:100 RNase I) and a low dose (1:15,000 RNase A and 1:500 RNase I) of RNase A and RNase I separately and combined after treatment. To immunoprecipitate UPF1–RNA complex, a UPF1 antibody (Thermo, cat. no. A301-902A) was incubated with Protein A/G beads (Pierce) first and then incubated with the mixed cell lysates for 2 h at 4 °C. Immunoprecipitated RNA fragments were then dephosphorylated (T4 PNK, NEB), polyadenylated and end-labeled with 3ʹ-azido-3ʹ-dUTP and IRDye 800CW DBCO Infrared Dye (LI-COR) on beads. SDS–PAGE was then performed to separate protein–RNA complexes, and RNA fragments were collected from nitrocellulose membrane by proteinase K digestion. cDNA was then synthesized using Takara smarter small RNA sequencing kit reagents with a custom UMI-oligoDT primer (CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTTTTTTTTTT). The RORC reporter locus was then amplified with a custom primer (ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGGGGTGATCCAAATACCACC) and sequencing libraries were then prepared with SeqAmp DNA Polymerase (Takara). Libraries were then sequenced on an illumina Hiseq 4000 sequencer.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

A systematic search for RNA structural switches across the human transcriptome

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis