Systematic identification of post-transcriptional regulatory modules

Cell linesAll cells were cultured in a 37 °C 5% CO2 humidified incubator. The 293 T cells (ATCC CRL-3216) were cultured in DMEM high-glucose medium supplemented with 10% FBS, glucose (4.5 g/L), L-glutamine (4 mM), sodium pyruvate (1 mM), penicillin (100 units/mL), streptomycin (100 μg/mL) and amphotericin B (1 μg/mL) (Life Technologies Corporation 15290026). The K562 cell line (ATCC CCL-243) was cultured in RPMI-1640 medium supplemented with 10% FBS, glucose (2 g/L), L-glutamine (2 mM), 25 mM HEPES, penicillin (100 units/mL), streptomycin (100 μg/mL) and amphotericin B (1 μg/mL) (Life Technologies Corporation 15290026). All cell lines were routinely screened for mycoplasma with a PCR-based assay.BioID2-RBP fusion cell line generation50 RBPs were selected based on the 3 criteria: (i) ENCODE eCLIP data availability for a given RBP, (ii) presence of a given RBP in the ORFeome entry clone library76, (iii) representing diverse RNA metabolic processes. In order to construct the cell lines stably expressing BioID2-RBP fusion proteins, we first cloned in an open reading frame of BioID2 enzyme17, followed by a linker (YPAFLYKVVYGGGGSGGGGSGGGGS) and attR-flanked ccdB counterselection marker for Gateway cloning, into the pWPI backbone (Addgene #12254). The resulting backbone is named pWPI_GW_BioID2_T2A_Blast (Addgene #135448) and is available on Addgene (#214831). We then used Gateway LR Clonase II Enzyme mix (Thermo Fisher 11791020) to clone the open reading frames of the RBPs of interest (from ORFeome entry clone library76) into the destination vector. The lentiviral constructs were co-transfected with pCMV-dR8.91 and pMD2.D plasmids using NanoFect (ALSTEM NF100) into 293 T cells (ATCC CRL-3216), following the manufacturer’s protocol. The virus was harvested 48 hours post-transfection and passed through a 0.45 μm filter. K562 cells (ATCC CCL-243) were then transduced for 2 h while centrifuging (800 RPM) with the filtered virus in the presence of 8 μg/mL polybrene (Millipore C788D57). Cells were selected with 20 μg/mL blasticidin (Gibco A1113903) for 5 days. The expression of the fusion protein was validated by western blotting.Western blottingCell lysates were prepared by lysing cells in ice-cold RIPA buffer (25 mM Tris-HCl pH 7.6, 0.15 M NaCl, 1% IGEPAL CA-630 (Sigma-Aldrich 9002-93-1), 1% sodium deoxycholate, 0.1% SDS) (Sigma-Aldrich SIAL-R0278-50ML) containing 1X protease inhibitors (Thermo Fisher Scientific PI78410). Lysate was cleared by centrifugation at 20,000 × g for 10 min at 4 °C. Samples were denatured for 10 min at 70 °C in 1X LDS loading buffer (Invitrogen/Fisher Scientific NP0007) and 50 mM DTT (Scientific Laboratory Supplies Ltd NAT1068). Proteins were separated by SDS-PAGE (Invitrogen/Fisher Scientific P2325) using 4–12% Bis-Tris NuPAGE gels (Thermo Fisher Scientific NP0321BOX), transferred to nitrocellulose (Millipore WP2HY315F5), blocked using 5% BSA (VWR International 97064-340), and probed using target-specific antibodies. Bound antibodies were detected using dye-conjugated secondary antibodies according to the manufacturer’s instructions. Antibodies: HA (BioLegend 901533), eIF3I (BioLegend 646701), beta-tubulin (Proteintech 10094-1-AP), GAPDH (Proteintech 10494-1-AP). The uncropped images of western blots are provided in the Source Data File.Biotin treatment and pulldownThe pulldown was performed as described in ref. 17. Cells were incubated with biotin-depleted media (biotin-free RPMI-1640 medium, supplemented with 10% dialyzed FBS, glucose (2 g/L), L-glutamine (2 mM), 25 mM HEPES, penicillin (100 units/mL), streptomycin (100 μg/mL) and amphotericin B (1 μg/mL) for 72 h before analysis. For BioID2 pulldown, 12 × 106 cells per replicate were incubated with 50 μM biotin for 24 h. For the negative control samples, 12 × 106 cells per replicate were incubated with DMSO. After three times of PBS washes, the cells were lysed in 1 ml of lysis buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 1% Sodium deoxycholate, 0.1% SDS, 1 × Complete protease inhibitor (Halt Phosphatase Inhibitor Cocktail; Thermo Fisher Scientific 78420), and 250 units benzonase (EMD Millipore 706643). The lysates were passed through a 25 G needle 10 times and cleared 10 min at 14,000 × g at + 4 °C. The protein concentration was measured with BCA Protein Assay Kit (Thermo Scientific A55865); the lysate was diluted to a concentration of 2 μg/mL. 500 μl of lysate was incubated with 125 μl of Dynabeads (MyOne Streptavidin C1; Thermo Fisher Scientific 65001) overnight with rotation at + 4 °C. Beads were collected using a magnetic stand and washed twice with 2% (wt/vol) SDS, twice with wash buffer containing 50 mM Tris, pH 7.5, 500 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 0.1% SDS, twice with wash buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 0.1% SDS, then boiled for 5 min in 50 μl of elution buffer containing 2% SDS, 100 mM DTT (Scientific Laboratory Supplies Ltd NAT1068), Tris-HCl pH 7.5. The supernatant was collected and saved for mass spectrometry analysis.Mass spectrometry analysisEluted BioID samples were reduced by the addition of 100 mM DTT (Scientific Laboratory Supplies Ltd NAT1068) and boiling at 95 °C for 10 min before being subjected to Filter Aided Sample Preparation (FASP) to generate tryptic peptides, as described previously (Dermit et al. Dev Cell, 2020). Briefly, samples were diluted 7-fold in UA buffer (8 M urea, 100 mM Tris HCl pH 8.5) (Sigma-Aldrich U1250-5KG), transferred to Vivacon 500 Hydrosart centrifugal filters with a molecular cut-off of 30 kDa (Sartorius), and concentrated by centrifugation at 14,000 × g for 15 min. Filters were then washed twice by the addition of 0.2 mL of UA buffer (Sigma-Aldrich U1250-5KG) to the filter tops and re-concentrating. Reduced cysteine residues were then alkylated by addition of 100 µL of 50 mM iodoacetamide (VWR International Ltd 786-228) dissolved in UA buffer (Sigma-Aldrich U1250-5KG), and incubation at room temperature in the dark for 30 min. The iodoacetamide solution was then removed by centrifugation at 14,000 × g for 10 min, and samples were washed twice with 0.2 mL of UA buffer (Sigma-Aldrich U1250-5KG)as before. Urea was then removed from samples by performing three washes with 0.2 mL of ABC buffer (0.04 M ammonium bicarbonate) (Sigma-Aldrich A64141-500G). Filters were then transferred to fresh collection tubes, and proteins were digested by the addition of 0.3 µg of MS grade Trypsin (Sigma-Aldrich T6567-1MG) dissolved in 50 µL of ABC buffer (Sigma-Aldrich A64141-500G), and overnight incubation in a thermo-mixer at 37 °C with gentle shaking (600 rpm). The resulting peptides were eluted from the filters by centrifugation at 14,000 × g for 10 min. Residual remaining peptides were further eluted by the addition of 100 µL ABC (Sigma-Aldrich A64141-500G) to the filter tops and centrifugation. This was repeated once and the combined eluates were then dried by vacuum centrifugation (no heating) and reconstitution in 2% Acetonitrile (ACN) (VWR International Ltd 9012.1000GL), 0.2% Trifluoroacetic acid (TFA) (Life Technologies Ltd Invitrogen Division 85183), followed by desalting using C18 StageTips (Rappsilber, et al., Nat Protoc. 2007). The desalted peptides were dried again by vacuum centrifugation (45 °C) and re-suspended in A* buffer (2%ACN, 0.5% Acetic acid (Fisher Scientific UK Ltd 10171460), 0.1% TFA in water) before LC-MS/MS analysis. 1/3rd of each sample was analyzed on a Q-Exactive Plus Orbitrap mass spectrometer coupled with a nanoflow ultimate 3000 RSL nano HPLC platform (Thermo Fisher). Samples were resolved at a flow rate of 250 nL/min on an Easy-Spray 50 cm × 75 μm RSLC C18 column with 2 µm particle size (Thermo Fisher), using a 123 min gradient of 3% to 35% of buffer-B (0.1% formic acid in ACN) against buffer-A (0.1% formic acid in water), and the separated peptides were infused into the mass spectrometer by electrospray (1.95 kV spray voltage, 255 °C capillary temperature). The mass spectrometer was operated in data-dependent positive mode, with 1 MS scan followed by 15 MS/MS scans (top 15 method). The scans were acquired in the mass analyzer at 375–1500 m/z range, with a resolution of 70,000 for the MS and 17,500 for the MS/MS scans. A 30-s dynamic exclusion of fragmented peaks was applied to limit repeated fragmentation of the same ions.Perturb-seq68 RBPs were chosen for Perturb-seq analysis based on the clustering analysis of the ENCODE eCLIP dataset and DeepBind dataset77. Perturb-seq experiment was performed as previously described78. Briefly, a library of 205 sgRNAs (5 non-targeting sgRNAs and 200 sgRNAs targeting 100 genes, 2 sgRNAs per gene) was ordered as a pooled oligonucleotide library from Twist Bioscience with the following design:[ATCTTGTGGAAAGGACGAAACACCG]-[Protospacer Sequence]-[GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC]The library was PCR-amplified using Q5 Hot Start High-Fidelity 2X Master Mix (NEB VWR International: 102500-140) with the primers with the following sequences: 5’-ATCTTGTGGAAAGGAC-3’ and 5’-GCCTTATTTTAACTTGCTA-3’. To clone libraries into CROPseq-Guide-Puro vector (Addgene #86708), the starting vector was digested with BsmBI (Fisher Scientific FERER0451) following the protocol outlined in ref. 79. The library was cloned into the digested backbone using the Gibson Assembly method80. The reaction product was transformed into Takara Stellar competent cells according to manufacturer recommendations, grown overnight in 100 mL LB with ampicillin, and purified using ZymoPURE II Plasmid Midiprep Kit (Zymo Research D4200). K562 cells (ATCC CCL-243) were infected with the plasmid library at a low multiplicity of infection to minimize double infection. The infected cells were selected with 2 µg/mL puromycin (Gibco A1113802) for 3 days. Live cells were isolated on a flow cytometer (FACSAria II) by propidium iodide staining (Thermo Fisher Scientific P1304MP). Approximately 5000 live cells were captured by 10X Chromium Controller using Chromium Single Cell 3’ Reagent Kits v2. Sample preparation was performed according to the manufacturer’s protocol. Samples were sequenced on a NovaSeq 6000 using the following configuration: Read 1: 28, i7 index: 8, i5 index: 0, Read 2: 98.To facilitate sgRNA assignment, sgRNA-containing transcripts were additionally amplified by PCR reactions by modifying a previously published approach81. The following primers were used for amplification: 5’-AATGATACGGCGACCACCGAGATCTACAC-3’ and 5’-CAAGCAGAAGACGGCATACGAGATTACGACAGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTggactatcatatgcttaccgtaacttgaaag-3’. PCR product was cleaned up by 1.0x SPRI beads (SPRIselect; Beckman Colter B23317). Samples were sequenced using paired-end 150 bp sequencing on an Illumina MiSeq sequencer.CRISPRi-mediated gene knockdownK562 cells (ATCC CCL-243) expressing dCas9-KRAB fusion protein were constructed by lentiviral delivery of pMH0006 (Addgene #135448) and FACS isolation of BFP-positive cells.The lentiviral constructs were co-transfected with pCMV-dR8.91 (Andwin Scientific NC2092494) and pMD2.D (Addgene #12259) plasmids using TransIT-Lenti (Mirus 75814-982) into 293 T cells (ATCC CCL-3216), following the manufacturer’s protocol. The virus was harvested 48 hours post-transfection and passed through a 0.45 µm filter. Target cells were then transduced overnight with the filtered virus in the presence of 8 µg/mL polybrene (Millipore C788D57).Guide RNA sequences for CRISPRi-mediated gene knockdown were cloned into pCRISPRia-v2 (Addgene #84832) via BstXI-BlpI sites. After transduction with sgRNA lentivirus, K562 cells (ATCC CCL-243) were selected with 2 µg/mL puromycin (Gibco A1113802). Knockdown of target genes was assessed by RT-qPCR using PerfeCTa SYBR Green SuperMix (QuantaBio 95054-500) per the manufacturer’s instructions. HPRT1 was used as endogenous control.RNA isolationTotal RNA for RNA-seq and RT-qPCR was isolated using the Zymo QuickRNA isolation kit (Zymo Research R1054) with in-column DNase treatment per the manufacturer’s protocol.RNA treatment with α-amanitinK562 (ATCC CCL-243) and K562 TAF15 knockdown cell lines were seeded at 1 M/1 mL density in 2 replicates. Cells were infected with 10 μg/mL α-amanitin(Sigma-Aldrich A2263) for 8-9 h prior to total RNA extractions. Total RNA for downstream RNA-seq was isolated using a Zymo QuickRNA Microprep isolation kit (Zymo Research R1050) with in-column DNase treatment per the manufacturer’s protocol.RNA-seqRNA-seq libraries were prepared using SMARTer Stranded Total RNA-Seq Kit v2 – Pico Input Mammalian (Takara 634411), and sequenced on Illumina NovaSeq 6000 instrument, in a PE150 (paired end 150 cycles) setting, at Novogene Corporation.Ribosome profilingRibosome profiling was performed as previously described82. Briefly, approximately 10 × 106 cells were lysed in ice-cold polysome buffer (20 mM Tris pH 7.6, 150 mM NaCl, 5 mM MgCl2, 1 mM DTT (Scientific Laboratory Supplies Ltd NAT1068), 100 µg/mL cycloheximide) supplemented with 1% v/v Triton X-100 and 25 U/mL Turbo DNase (Thermo Fisher Scientific AM2238). The lysates were triturated through a 27 G needle and cleared for 10 min at 21,000 × g at 4 °C. The RNA concentration in the lysates was determined with the Qubit RNA HS kit (Thermo Fisher Q32852). Lysate corresponding to 30 µg RNA was diluted to 200 µl in polysome buffer and digested with 1.5 µl RNaseI (Epicenter VWR International 101228-268) for 45 min at room temperature. The RNaseI was then quenched by 10 µl SUPERaseIN (Thermo Fisher Scientific AM2696).Monosomes were isolated using MicroSpin S-400 HR (Cytiva) columns, pre-equilibrated with 3 mL polysome buffer per column. 100 µl digested lysate was loaded per column (two columns were used per 200 µl sample) and centrifuged for 2 min at 600 × g. The RNA from the flow-through was isolated using the Zymo RNA Clean and Concentrator-25 kit (Zymo Research R1017). In parallel, total RNA from undigested lysates was isolated using the same kit.Ribosome-protected footprints (RPFs) were gel-purified from 15% TBE-Urea gels (Life Technologies EC6875BOX) as 17–35 nt fragments. RPFs were then end-repaired using T4 PNK (NEB M0201S), and pre-adenylated barcoded linkers were ligated to the RPFs using T4 Rnl2(tr) K227Q (NEB M0351S). Unligated linkers were removed from the reaction by yeast 5’-deadenylase (NEB MO0331S) and RecJ nuclease (NEB M0264S) treatment. RPFs ligated to barcoded linkers were pooled, and rRNA-depletion was performed using riboPOOLs (siTOOLs) as per the manufacturer’s recommendations. Linker-ligated RPFs were reverse transcribed with ProtoScript II RT (NEB M0368S) and gel-purified from 15% TBE-Urea gels. cDNA was then circularized with CircLigase II (Epicentre) and used for library PCR. First, a small-scale library PCR was run supplemented with 1X SYBR Green and 1X ROX (Thermo Fisher Scientific K0221) in a qPCR instrument. Then, a larger scale library PCR was run in a conventional PCR instrument, performing a number of cycles that resulted in ½ maximum signal intensity during qPCR. Library PCR was gel-purified from 8% TBE gels and sequenced on a SE50 run on an Illumina HiSeq4000 instrument at the UCSF Center for Advanced Technologies.ATAC-seqThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) was performed according to the optimized Omni-ATAC protocol83,84. Briefly, samples containing 50,000 cells as input were pelleted, lysed, washed, and re-pelleted using the lysis and wash buffers specified in the Omni-ATAC protocol. A transposition mix containing Tn5 was then added to the samples, and the transposition reaction was carried out for 30 min at 37 °C in a thermomixer with 1000 rpm mixing. After transposition, the transposed DNA was purified using the DNA Clean & Concentrator-5 Kit (Zymo Research D4014). The samples underwent two PCR steps. First, a pre-amplification was performed for 3 cycles to attach unique barcoded adapters to the transposed DNA sample. The concentration of each pre-amplified sample was quantified via qPCR using the NEBNext Library Quant Kit (New England Biolabs E7630). Afterward, samples underwent a second PCR amplification step to obtain the desired DNA concentration of 8 nM in 20 µl. DNA cleanup and qPCR quantification were performed again, and the final libraries were diluted down to exactly 8 nM using sterile water. Samples were sequenced using paired-end 75-bp sequencing on an Illumina NextSeq sequencer.ChIP-qPCRChIP-qPCR was performed as described in ref. 85. Human chronic myelogenous leukemia K562 cells (ATCC CCL-243) were grown at 37 °C and 5% CO2 in RPMI-1640 medium supplemented with 10% FBS, glucose (2 g/L), L-glutamine (2 mM), 25 mM HEPES, penicillin (100 units/mL), streptomycin (100 μg/mL) and amphotericin B (1 μg/mL) (Gibco). 20 million cells per sample were washed with PBS (in duplicate), pelleted, and cross-linked with 1% paraformaldehyde (Fisher Scientific AC416780010) for 10 min at room temperature. Glycine (Sigma-Aldrich 9002-93-1) at a final concentration of 125 mM was added to the samples and incubated at room temperature for 5 min to quench the paraformaldehyde (Fisher Scientific AC416780010). Samples were washed with PBS, pelleted, flash-frozen, and stored at − 80. Samples were thawed, lysed in 200 µl Membrane Lysis Buffer (10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.5% IGEPAL CA-630, 1X protease inhibitors), and incubated on ice for 10 min. Samples were centrifuged at 4 °C at 2500 × g for 5 min, resuspended in 200 µl Nuclei Lysis Buffer (50 mM Tris pH 8.0, 10 mM EDTA, 0.32% SDS, 1X protease inhibitors), and incubated on ice for 10 min. 120 µl of IP Dilution Buffer (20 mM Tris-HCl pH 8.0, 2 mM EDTA, 150 mM NaCl, 1% Triton X-100, 1X protease inhibitors) was added to the samples, and the samples were sonicated using the Bioruptor UCD-200 sonicator for 7 min with 30 s on/off intervals for a total of 3 times. Samples were centrifuged at 4 °C at 21000 × g for 5 min to clear the lysate, and the supernatant containing the chromatin was stored at − 80.230 µl IP Dilution Buffer was added to 270 µl chromatin along with 3 µg ZNF800 or QKI antibody or same- species IgG, and the samples were incubated overnight at 4 °C. The next day, the ChIP samples were spun down at 4 °C at 16000 × g for 5 min, and the supernatant was transferred onto 20 µl of washed Protein A/G beads (Fisher Scientific 88802). Samples were incubated for 2 h at 4 °C.The ChIP material was washed once with 500 µl of cold FA lysis low salt buffer (50 mM Hepes-KOH pH 7.5, 150 mM NaCl, 2 mM EDTA, 1% Triton-X 100, 0.1% sodium deoxycholate), twice with cold NaCl high salt buffer (50 mM Hepes-KOH pH 7.5, 500 mM NaCl, 2 mM EDTA, 1% Triton-X 100, 0.1% sodium deoxycholate), once with cold LiCL buffer (100 mM Tris-HCl pH 8.0, 500 mM LiCl, 1% IGEPAL CA-630, 1% sodium deoxycholate), and twice with cold 10 mM Tris 1 mM EDTA pH 8.0. Samples were eluted in 300 µl of Proteinase K reaction mix (20 mM Tris pH 8, 300 mM NaCl, 10 mM EDTA, 5 mM EGTA, 1% SDS, 60 µg Proteinase K) and incubated at 65 °C for 1 h. The supernatant was transferred to phase lock tubes (VWR), purified via phenol-chloroform extraction, and eluted in 30 µl 10 mM Tris pH 8.0.qPCR was performed using PerfeCTa SYBR Green SuperMix (QuantaBio) per the manufacturer’s instructions. HPRT1 was used as endogenous control.Crosslinking and immunoprecipitationK562 cells (ATCC CCL-243) were harvested and crosslinked with ultraviolet radiation (400 mJ/cm2). Cell lysates were then treated with high (1:3000 RNase A and 1:100 RNase I) and low dose (1:15000 RNase A and 1:500 RNase I) of RNase A (Thermo Fisher Scientific EN0531) and RNase I (Thermo Fisher Scientific EN0601) separately and combined after treatment. Antibodies to TAF15 (Thermo MA3-078, dilution according to manufacturer’s recommendation) or ZC3H11A (Abcam ab241612, dilution according to manufacturer’s recommendation) were first conjugated to protein A/G beads (Pierce) and then added to cell lysates to immunoprecipitate protein-RNA complex. This was followed by beads dephosphorylation, polyadenylation, and IRDye® 800CW DBCO Infrared Dye (LI-COR 929-50000) end labeling of the immunoprecipitated RNA fragments. RNA-protein complex was then resolved by SDS-PAGE and visualized on nitrocellulose membrane. Membranes were then cut and treated with proteinase K to release RNA. We then used Takara smarter small RNA sequencing kit reagents with a custom UMI-oligo dT primer (CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTTTTTTTTTT) to synthesize cDNA. Sequencing libraries were then prepared with SeqAmp DNA Polymerase (Takara 638509) and sequenced on an Illumina Hiseq 4000 sequencer.Immunofluorescence assayK-562 cells were seeded and grown on Poly-D-Lysine (MP Biomedical 0215017550) coated chamber slides (SPL 30108). Cells were fixed with 4% paraformaldehyde (PFA) (Fisher Scientific AC416780010) for 5 min at room temperature, followed by permeabilization with 0.5% PBST for 10 min and blocking with 4% BSA for 1 h. Primary antibodies were diluted according to manufacturers’ recommendation and incubated overnight at 4 C. Cells were then stained with a standard amount of fluorescent secondary antibody for 1 h at room temperature. Samples were then mounted in ProLong™ Gold Antifade Mountant with DAPI (Life Technologies P36941) and imaged with a (Zeiss LSM 780) confocal microscope (courtesy of the Cardiovascular Research Institute at UCSF).Computational toolsReanalysis of enhanced CLIP ENCODE data. To reliably identify RNA targets of RBPs in K562 cells (ATCC CCL-243), we started with the raw eCLIP FASTQ files of ‘released’ K562 experiments for 120 RBPs that were available in the ENCODE database. The analysis was performed as follows: (1) the reads were preprocessed in the same way as in ref. 18 including adapter trimming with cutadapt (v1.18)86, (2) preprocessed reads were mapped to the hg38 genome assembly with GENCODE v31 comprehensive annotation using hisat2 (v.2.1.0)87, (3) the aligned reads were deduplicated using the barcodecollapsepe.py script (https://github.com/YeoLab/eclip/tree/master/bin) as in ref. 18, (4) properly paired and uniquely mapped second reads were extracted using samtools (v.1.9, with -f 131 -q 60 parameters)88, (5) gene-level read counts were obtained with plastid (v.0.4.8) by counting 5’ ends of the reads89, (6) analysis of specific enrichment against size-matched control experiments was performed with edgeR (v.3.18.1) for each RBP separately, considering only genes passing 2 cpm in at least 2 of 3 samples90. Reliable RNA targets of each RBP were defined as those passing 5% FDR and log2(Fold Change) > 0.5, see Supplementary Data File 8. eCLIP target scores (TSs) used in datasets integration were estimated as -log10(P)·sign(log2FC) for every “RBP-gene” pair separately.MS data analysis (BioID2 mass spectrometry data)Data were quantified and queried against a Uniprot human database (January 2013) using MaxQuant MaxLFQ command91. Data normalization was performed in Perseus92 (version 1.6.2.1). For batch correction, Brent Pedersen’s implementation93 of the ComBat function from sva package94 was used. The protein abundances in “experiment” (biotin +) and “control” (biotin −) samples were compared using t test for each protein individually.Perturb-seq analysisCell Ranger (version 3.0.1, 10X Genomics) with default parameters was used to align reads and generate digital expression matrices from single-cell sequencing data. To assign cell genotypes, a bwa ref. 95 database was created containing all guide sequences present in the library using the bwa index command. The barcode-enrichment libraries were mapped to this database to establish the guide identities; to detect the cell barcodes, the barcode correction scheme used in Cell Ranger was used (the mapping of uncorrected to corrected barcodes was extracted from Cell Ranger analysis run of the whole transcriptome libraries; this mapping was then applied to the reads of barcode-enrichment libraries). UMI correction was performed by merging the UMIs within the hamming distance of 1 from each other. For each UMI, the guide assignment was done by choosing the guide sequence most represented among the reads containing the given UMI. To make the final assignment of a guide to cell barcodes, we only considered the barcodes that were represented by at least 5 different UMIs, with > 80% of UMIs representing the same guide.Data filtering was performed using scanpy package96. Data were denoised using a modification of scvi autoencoder97 with loss function penalizing for the similarity between cells having different RBPs knocked down. The distance between transcriptome profiles of individual RBP knockdowns was calculated by applying the t test to individual gene counts across the cells that were assigned the respective guide sequence.Dataset integrationThe functional similarity of RBPs was estimated by joint analysis of eCLIP, BioID, and Perturb-seq data (Fig. 1 and Supplementary Fig. S1). First, TS Z-scores were calculated for every gene across RBPs separately for each type of experimental data (eCLIP, BioID, or Perturb-seq) in the same way as preys of the BioID data, see above and Supplementary Fig. S1(1). Next, cosine distance was computed for all 7140 pairs of different RBPs, followed by ranking and calculation of empirical p-values defined as a fraction of RBP pairs with the cosine distance less than the score of the tested pair, see Supplementary Fig. S1(2). The empirical p-values were aggregated with logitp function from the metap R package (v.1.4)98, see Supplementary Figs. S1(3, 4), for 4005 RBP-RBP pairs (90 proteins in total) with at least 2 out of 3 available data types. Heatmap.2 functions of the gplots R package (v.3.1.1) with cosine distance and Ward’s (ward.D2) clusterization were used to generate the integration heatmap shown in Fig. 2.STRING-based RBP interaction heatmap was generated using protein links’ combined scores (STRING v.11.5) and the same clustering method as in the dataset integration procedure19. To test the overlap between the integrated interaction map and the external databases, we also downloaded significant protein-protein interactions from OpenCell5 and hu.MAP22 databases. With these data, we estimated the fractions of the interactions found in STRING with the combined score over 400 (medium confidence STRING interactions), in OpenCell, in hu.MAP with a score over 0.02 (medium confidence hu.MAP interactions) and in Zanzoni et al. with at least 150 complexes shared between RBPs, among the RBP-RBP pairs with the integrated distance passing a selected quantile threshold (Supplementary Fig. S2E). We also considered the fractions of OpenCell- and hu.MAP-based interactions among the pairs not included in STRING medium confidence interactions. To estimate the significance of the intersection, we performed the same procedure with 104 random shuffles of the IRIM. Finally, the empirical p-values were estimated and corrected for multiple testing using Benjamini-Hochberg (FDR) adjustment. To estimate the consistency of the results depending on the datasets used for distance integration, we additionally performed the procedure described above using distances integrated from eCLIP and Perturb-seq (2278 RBP-RBP pairs with both datasets available, p-value < 10−4), BioID and Perturb-seq (378 pairs, p-value = 0.028), BioID and eCLIP (1225 pairs, p-value < 10−4, Supplementary Fig. S2E).To evaluate the stability of protein-protein interactions within the IRIM, columns in the 90 × 90 matrix were shuffled at varying fractions of columns (1, 5, 10, 25, and 50%) to observe alterations in inter-RBP distances and matrix topology. The shuffling involved the calculation of cosine distances from each of the original 90 RBP’s distance vectors to the respective vectors in the partially shuffled matrix, focusing on the minimal, median, and maximal distances to other RBPs. This procedure was repeated 10 times, generating 900 estimates for each group and percentage of shuffled columns, ensuring a comprehensive analysis of distance variations and topological alterations. To compare the inter-RBP distances stabilities of the IRIM and STRING, OpenCell or hu.MAP, the same procedure was applied to the respective binary interaction matrices 10 times for each shuffling percent. For STRING and hu.MAP, interactions were considered valid if the STRING combined score was > 400 or hu.MAP score was > 0.02, respectively, and for RBPs, protein-protein interactions were assumed if the distance was within the < 25% quantile of the IRIM. Moreover, 90 RBPs were randomly chosen from the STRING, OpenCell, and hu.MAP interaction matrices before shuffling to make their sizes comparable to the IRIM. This procedure was repeated 5 times.Transcript types enrichment analysis of RBP RNA targetsA joint set of 22471 genes detected at 2 counts per million (cpm) in at least two samples of one eCLIP experiment was used as the background for further analysis. RBPs preferences to bind RNAs of a particular type were assessed using a one-sided Fisher’s exact test. The following types of RNAs were selected based on GENCODE annotation: miRNA, lncRNA, protein_coding, snRNA, snoRNA, and rRNA. For each RBP separately, the p-values were adjusted for multiple testing using FDR correction for the number of tested RNA types. Visualization of the eCLIP, RNA-seq, and ATAC-seq profiles generated using bedtools genomecov (v.2.27.1) was performed with svist4get (v.1.2.24)99,100.Functional annotation of RBPsTo annotate the RBPs based on prey identified in BioID experiments, target scores (TSs) were estimated as -log10(P)·sign(log2FoldChange) for every bait-prey pair separately. Next, for each prey, TSs were converted to Z-scores by estimating mean and average across baits. The preys were ranked by Z-scores, and the Fgsea R package (v.1.12.0) was applied to perform gene set enrichment analysis with 100000 permutations and three GO terms annotation sets (BP, MF, and CC, each taken separately)57. The annotation sets were generated with the go.gsets function of gage R package (v.2.36.0)101. Lists of 2865 Entrez ids of preys were used in fgsea analysis for each RBP of the total set of fifty. GO terms with NES > 2 for at least one RBP were considered when plotting Fig. 3 and Supplementary Fig. S3 (related GO terms were merged manually), negative NES were zeroed for clarity and easier interpretability of the consequent clusterization, see complete data in Supplementary Data File 3). Ward.D2 clusterization along with cosine distance (1 – cosine similarity) were used to generate the heatmaps using the heatmap.2 function of the gplots R package (v.3.1.1)102.To check the consistency between predicted and known RBP annotations, the same procedure was performed excluding the Z-scoring step to avoid penalizing common generic GO terms e.g., “organelle”, “cell”, etc. The resulting GSEA p-values and NESs were used to calculate the <RBP, GO term> scores as -log10(P)·sign(log2FoldChange) for each RBP and GO term separately. RBPs’ “true” annotations were extracted from the same GO BP, CC, or MF annotation set as used in GSEA. Finally, all data were merged to generate the ROC curve with PRROC (v.1.3.1) roc.curve function103.Alternative splicing analysisWe used MISO104 for alternative splicing analysis, as this tool is known for its consistent performance and wide use105. Specifically, RNA-seq data was processed as follows: (1) to fulfill MISO requirements (see below), the reads obtained with different sequencing lengths were truncated to 75 bps with cutadapt (v.2.10) -l option, (2) the truncated reads were mapped to the human hg38 genome assembly with GENCODE v38 comprehensive gene annotation using STAR (v.2.7.9) with options –outFilterScoreMinOverLread and –outFilterMatchNminOverLread both set to 0.25106, (3) non-unique alignments were filtered, and the replicates were merged, (4) the insert size distribution was estimated for each merged bam file separately using pe_utils –compute-insert-len from MISO (v.0.5.4), constitutive exons were retrieved using exon_utils with –get-const-exons and –min-exon-size 1000104, (5) alternative splicing events were identified using miso –run with –read-len set to 75 and –paired-end set to the previously estimated insert size parameters. Finally, only cases with non-zero numbers of exclusion and inclusion read, and the sum of these reads ≥ 10 in at least one sample is left and shown in Fig. 4.Ribosome profiling analysisTo process the reads, the Ribo-seq reads were first trimmed using cutadapt (v2.3) to remove the linker sequence AGATCGGAAGAGCAC. The fastx_barcode_splitter script from the Fastx toolkit was then used to split the samples based on their barcodes. Since the reads contain unique molecular identifiers (UMIs), they were collapsed to retain only unique reads. The UMIs were then removed from the beginning and end of each read (2 and 5 Ns, respectively) and appended to the name of each read. Bowtie2 (v2.3.5) was then used to remove reads that map to ribosomal RNAs and tRNAs, and the retained reads were then aligned to mRNAs (we used the isoform with the longest coding sequence for each gene as the representative). Subsequent to alignment, umitools (v0.3.3) was used to deduplicate reads.The quality check and downstream processing of the processed reads was performed using Ribolog v0.0.0.914. To distinguish stalling peaks from stochastic sequencing artifacts, we followed a multi-step procedure. We calculated P-site offsets and identified the codon at the ribosomal A-site for each RPF read using the riboWaltz package. A loess smoother was used to de-noise codon-wise RPF counts. The loess span parameter varied depending on the transcript length and allowed borrowing information from ~ 5 codons on either side of the A-site. We calculated an excess ratio at each codon position by dividing the loess-smoothed count by the transcript’s background translation level (median of no-zero loess-smoothed counts). After median normalization of the corrected counts and removal of transcripts with 0 counts, the ribosome occupancy testing was performed using logistic regression in Ribolog.ATAC-seq analysisENCODE ATAC-seq pipeline107 with default parameters was used for sequencing data processing and analysis. The differentially accessible peaks were identified with the DESeq2 package108 and annotated with the ChIPseeker package109. To perform a comparison against published ChIP-Seq data, the processed ChIP-exo results were downloaded from GEO (GSE151287)68. The data consisted of bed files containing 33 and 181 QKI peaks (two replicates) and a bigWig file with ZNF800 ChIP-exo signal (no ChIP-exo peaks were reported for ZNF800). In total, 234564 and 222350 ATAC-seq peaks for QKI and ZNF800, respectively, had coverage of at least 10 reads in more than one replicate and were used in the following tests. For QKI, the bed files with ChIP-exo peaks were merged, transferred to the hg38 genome assembly with UCSC liftOver and the numbers of differentially accessible (or not differentially accessible) QKI-KD ATAC-seq peaks that intersect (or do not intersect) ChIP-exo peaks were calculated using bedtools intersect (v.2.26.0)99,110 followed by a one-sided (‘greater’) Fisher’s exact test on 2 × 2 contingency table. For ZNF800, bigWig files were converted to bed using UCSC bigWigToWig (v.377) and wig2bed from BEDOPS (v.2.4.38)111,112, followed by UCSC liftOver to the hg38 genome assembly. The resulting regions were intersected with differentially accessible and not differentially accessible ZNF800-KD ATAC-seq peaks using bedtools intersect, followed by a comparison of ChIP-exo signal distribution in these two sets using non-parametric Mann-Whitney U test.Mass Spectrometry data analysis (TAF15 KD proteomic quantification)Quantitative analysis of the TMT experiments was performed simultaneously with protein identification using Proteome Discoverer 2.5 software. The precursor and fragment ion mass tolerances were set to 10 ppm, 0.6 Da, respectively), the enzyme was Trypsin with a maximum of 2 missed cleavages, and the UniProt Human proteome FASTA file and common contaminant FASTA file was used in SEQUEST searches. The impurity correction factors obtained from Thermo Fisher Scientific for each kit were included in the search and quantification. The following settings were used to search the data; dynamic modifications; Oxidation / + 15.995 Da (M), Deamidated / + 0.984 Da (N, Q), Acetylation /+ 42.011 Da (N-terminus), and static modifications of TMT6plex / + 229.163 Da (N-Terminus, K), MMTS / + 45.988 Da (C).Scaffold Q + (version Scaffold_5.0.1, Proteome Software Inc., Portland, OR) was used to quantitate TMT Based Quantitation peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 78.0% probability to achieve an FDR less than 1.0% by the Percolator posterior error probability calculation113. Protein identifications were accepted if they could be established at greater than 5.0% probability to achieve an FDR less than 1.0% and contained at least 1 identified peptide. Protein probabilities were assigned by the Protein Prophet algorithm114. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Proteins sharing significant peptide evidence were grouped into clusters. Channels were corrected by the matrix [0.000,0.000,0.931,0.0689,0.000]; [0.000,0.000,0.933,0.0672,0.000]; [0.000,0.00750,0.931,0.0619,0.000]; [0.000,0.0113,0.929,0.0593,0.000]; [0.000,0.0121,0.934,0.0532,0.000934]; [0.000,0.0148,0.923,0.0499,0.0120]; [0.000,0.0251,0.931,0.0438,0.000]; [0.000,0.0206,0.936,0.0431,0.000]; [0.000,0.0291,0.937,0.0337,0.000]; [0.000,0.0776,0.892,0.0303,0.000] in all samples according to the algorithm described in i-Tracker115. Normalization was performed iteratively (across samples and spectra) on intensities, as described in Statistical Analysis of Relative Labeled Mass Spectrometry Data from Complex Samples Using ANOVA116. Means were used for averaging. Spectra data were log-transformed, pruned of those matched to multiple proteins, and weighted by an adaptive intensity weighting algorithm. Of 22889 spectra in the experiment at the given thresholds, 20372 (89%) were included in quantitation. Differentially expressed proteins were determined by applying t test with an unadjusted significance level of p-value < 0.05, corrected by Benjamini-Hochberg.Statistics & reproducibilityStatistical parameters are reported in the figures and figure legends, including the definitions and experimental measures depicted either as bar charts representing mean and dot plots representing exact values or as boxplots representing median, 25th and 75th percentile (boxes), and 5% and 95% confidence intervals (error bars). For the BioID-based RBP annotation procedure, statistical significance is indicated by asterisks * if GSEA FDR adjusted p-value < 0.05. Pairwise comparisons of qPCR results and log-transformed MS intensity ratios were performed using a one-sided t test (for testing alternative splicing) or Wilcoxon rank sum test (for testing protein levels and mRNA relative stability). Exact p-values are depicted above the corresponding bar charts. For TAF15 mRNA target enrichment analysis, GSEA statistics, including p-values and enrichment scores, are depicted in the figure. To test the intersection of different TAF15 regulons, p-values were calculated using one-sided Fisher’s exact tests with the statistical significance indicated by asterisks *, p-value < 0.05, **, p-value < 10−5. Pairwise comparisons of the QKI and ZNF800 target expression level and chromatin accessibility were performed using a one-sided Wilcoxon rank sum test with exact p-values depicted above the boxplots.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles