Integrative residue-intuitive machine learning and MD Approach to Unveil Allosteric Site and Mechanism for β2AR

The proposed identification pipeline is illustrated by Fig. 1. Here, extensive gaussian accelerated molecular dynamics (GaMD) simulations are first performed to enhance sampling in order to construct a sufficient conformation space (Fig. 1a). With the conformation space, a residue-intuitive hybrid machine learning (RHML) framework is constructed, which is composed of an unsupervised clustering and an interpretable convolutional neural network (CNN) based multi-classifier (Fig. 1b). By using RHML, we can determine the optimal number of clusters (labels) and the conformation state with the opening of the allosteric site (Fig. 1c). Then, the allosteric site is identified by FTMap coupled with the LIME interpreter of RHML (Fig. 1c). The potential allosteric modulators are screened from two compound datasets, based on the identified allosteric site (vide Fig. 1d). As illustrated by Fig. 1e, the regulation effect of the allosteric site/drug and its regulation pathway are further probed by conventional MD (cMD), binding energy analysis, structural analysis, and regulation pathway analysis. Finally, experimental validation is performed by cAMP accumulation assays, β-arrestin recruitment assay and site-directed mutagenesis experiments (vide Fig. 1f). In total, this work involves six systems, 15-μs GaMD simulations and 22.5-μs cMD simulations. Supplementary Table 1 lists detailed MD information for each system. Fig. 1: Overview of investigation framework.a Conformation space generated by GaMD. b Residue-intuitive hybrid machine learning (RHML). RHML has consisted of an unsupervised clustering and an interpretable CNN-based multi-classification model. c Identification of key conformation and allosteric site with the aid of the LIME interpreter of RHML and FTmap. d Virtual screening to the allosteric modulator. e Assessment of the allosteric effect and mechanism for the allosteric site/modulator. f Experimental validation.Construction of the residue-intuitive hybrid machine learning (RHML)In order to capture the conformation state with a cryptic allosteric site from the vast unknown conformation space, we need to perform two tasks. One is to first classify the conformations according to structure differences between conformational classes. The other is to determine the important residue fluctuations of which class is associated with the function region, as the allostery is a functional mechanism driven by the residue conformation transition. To this end, an unsupervised classification task was first utilized to label the conformation categories in the unknown trajectory space generated by GaMD. Herein, we selected unsupervised clustering, which has been widely served as an auto-labeling strategy25,26. With the labels obtained, we further trained a supervised classification model to identify the conformation state with the opening of the allosteric site. In the two tasks, we need to address two main technical challenges. One is how to determine the optimal number of clusters in the unsupervised clustering, and the other is how to obtain the residue-level interpretability in the supervised classification model. Thus, a residue-intuitive hybrid machine learning framework (RHML in Fig. 1b) was constructed by combining an unsupervised clustering and a supervised classification model. Herein, the k-means algorithm was adopted, as it has been considered as a popular and effective method for MD trajectory clustering27. For the supervised model, an interpretable CNN-based multi-classification model was exploited to achieve accurate classification with the capacity to identify important residues deciding the classification result (vide Fig. 2). In the deep learning model, the pixel map representation was proposed to avoid hand-feature engineering with the risk of information loss in conformation representation. Accordingly, the convolution neural network with powerful learning capacity on the image was utilized to realize accurate classification in terms of the category labels inferred from the unsupervised clustering (Fig. 2a). More importantly, we explored an interpreter based on the locally linear approximation paradigm (named as LIME interpreter) to address the black-box limitation of deep learning in interpretability. Based on the interpreter, we could further identify key residues deciding the classification result, through which the conformation state with the putative allosteric site can be captured (Fig. 2b). Technical details regarding the interpretable CNN-based multi-classification model are described in Methods.Fig. 2: Architecture of the interpretable CNN-based multi-classification model proposed by the work.a A CNN-based multi-classifier. Each conformation is represented by a pixel map through a matrix transformation, and the CNN-based classification model is trained based on the pixel maps and their category labels inferred from the clustering analysis. b LIME interpreter for the CNN model. Based on a locally linear approximation paradigm, a LIME interpreter is developed to identify important residues deciding the CNN classification result. In the picture of “local linear approximation,” salmon, blue, and yellow backgrounds represent three classification classes, while salmon crosses, blue triangles, and yellow stars represent the conformation samples of the three different classes, respectively. For example, the star highlighted in red line represents a conformation sample being explained, around which the perturbed dataset represented by yellow stars is generated by adding perturbations. The star sizes denote the proximity measure between the perturbed sample and the sample being explained. The gray dotted line represents the local linear model that is trained on the perturbed dataset. LIME matrixes are then generated for each class to evaluate the importance of each pixel in deciding the specific class. By projecting the important pixels into the corresponding atoms, important residues can be identified.To ensure a reliable classification result and a rational interpretation, a high prediction accuracy is required for the supervised classifier. Thus, the accuracy of the classification model can serve as a feedback metric to determine the optimal number of categories for the unsupervised clustering (Fig. 1b). The optimal number will act as the final labels of the classification model to identify the key residues with the aid of the LIME interpreter.A conformation ensemble with putative allosteric site is identified by RHML for β2ARTo cover the opening of cryptic pockets, five independent 3-μs GaMD simulations were carried out for the inactive β2AR bound by the agonist norepinephrine (NE) to generate an extensive ensemble of receptor conformations, through which 150,000 conformations from the five trajectories were extracted to construct data sets for machine learning (see “Methods” for more details). The conformations of every trajectory were first clustered based on the root mean square deviation (RMSD) of the receptor backbone atoms excluding the highly flexible ICL3 region (residue numbers: 231–262). In the text, we only present the result from one trajectory (labeled as traj1) with a putative allosteric site. The other four trajectory results and related discussion are placed in Supplementary Figs. 1, 2 which do not involve new allosteric sites. To determine the optimal number of clusters k, which is also an open and challenging problem for unsupervised clustering, we considered three clustering evaluation indices (SSR/SST, pSF, DBI) to initially estimate k values from 2 to 7 (Fig. 3a). SSR/SST represents the explained variance, and the value closer to 1 indicates better clustering. As reflected by the green line in Fig. 3a, SSR/SST increases gradually with increasing k, but the rise becomes weak after a critical value. Using the elbow method28 to identify the point of maximum curvature in the curve, the optimal number of clusters can be determined to be k = 3. pSF, a metric measuring separation between all the clusters, suggests that larger values correspond to better clustering. The red line in Fig. 3a shows the rise of pSF with increasing k values, favoring k = 7 or higher as the best choice. DBI measures similarity within and between clusters. Smaller DBI values imply better clustering. It turns out that k = 2 has the smallest DBI (DBI = 0.522), while k = 3 (DBI = 0.526) is very close to k = 2. In other words, both k = 2 and k = 3 can be taken as reasonable choices (yellow line in Fig. 3a). The inconsistency of the optimal cluster number between different clustering metrics is a common phenomenon, as there may not be a definite optimal k value for complex data. Thus, the choice of the k value depends on balancing different validity indices and considering specific research purpose29. Herein, we referenced the classification accuracy of the CNN-based model. Figure 3b shows the CNN-based classification accuracy for different numbers of clusters. We first excluded k = 6 and k = 7 due to low classification accuracy (<0.8), which would drop the reliability of our LIME interpreter. Similarly, k = 5 was not considered due to its poor performance in the DBI index. Compared with k = 2 and k = 4, both SSR/SST and DBI indices favor k = 3. Furthermore, our DL-based classification model also achieves a prediction accuracy of 0.903 ± 0.004 at k = 3. Taken together, we used the labels of the three categories to train the interpretable CNN-based classifier, in which the LIME interpreter identified important residues deciding the classification result, as shown in Fig. 3c–h.Fig. 3: Performance of the RHML model and important residues identified by the LIME interpreter for traj1.a Three evaluation metrics of clustering for different numbers of clusters (k = 2–7). SSR/SST represents the explained variance wherein values closer to 1 indicate better clustering. Pseudo-F statistic (pSF) measures the cluster separation, and Davies-Bouldin Index (DBI) assesses the cluster similarity. Higher pSF values and lower DBI values indicate better results. b Prediction performance of the interpretable CNN-based multi-classifier for different numbers of clusters (k values). Data are presented as the mean (points) ± standard deviation (error bars) derived from the five-fold cross-validation. c–e Scores of the top 20 important residues identified by the LIME interpreter in deciding the three-classification result such as cluster0 (c), cluster1 (d), and cluster2 (e). f–h Distribution of the top 20 important residues in the 3D structure of β2AR for cluster0 (f), cluster1 (g), and cluster2 (h). The important residues are highlighted in spheres.It can be seen from Fig. 3f, h that the important residues deciding the cluster0 and cluster2 mainly distribute at the extracellular end of TM6 and TM7 as well as the extracellular loops (ECL2 and ECL3). These regions were already revealed to involve an allosteric site of β2AR reported23, demonstrating the effectiveness of our RHML for identifying the allosteric site. Since our objective is to discover additional allosteric sites, the reported site was not considered for further investigation. Interestingly, for cluster1, important residues identified distributed in the middle and near the intracellular end of TM6 and TM7, including key residues such as F2826.44, N3187.45, N3227.49, and P3237.50, which were revealed to be molecular switches in the receptor activation (vide Fig. 3d, g). The observation suggests that the cluster1 undergoes specific conformational changes in these important regions associated with the activity of β2AR, implying its functional potential. Given that allostery is a functional mechanism involving the GPCR activity, we selected representative conformations from cluster1 to further identify allosteric sites by using FTMap.One additional allosteric site identified by FTMap coupled with the RHML interpreterFTMap is an energy-based method for identifying binding sites, which has been accepted as an effective tool to predict potential allosteric sites within the helical regions of GPCRs30. We selected two representative conformations (labeled as Conf1 and Conf2) from cluster1, which can account for 70% of conformations, to perform the site mapping by means of FTMap. In the two conformations, FTMap detects more than ten consensus sites (CSs), as shown in Supplementary Fig. 3. In fact, how to effectively identify the sites with potential allosteric function has been a difficult task for various pocket identification tools. Since the cryptic allosteric sites are exposed to conformational changes, it is reasonable to assume that the pockets near the important residues, which decide the conformational states and associate with function regions, will have a high possibility of acting as the allosteric site. Thus, the result from the LIME interpreter of our RHML framework was utilized to facilitate the allosteric site identification. It was found that one allosteric site in Conf1 identified by FTMap is comprised of CS0, CS1, CS2, and CS6, while the identified site in Conf2 consists of CS0, CS3, and CS4. These probe clusters are in proximity to some important residues revealed by our LIME interpreter (vide Supplementary Fig.3). Table 1 lists the allosteric site residues identified for Conf1 and Conf2. It can be seen that the majority of residues are the same for the two conformations and only several residues are different due to the flexibility of the site. Thus, they represent the same binding site despite some differences in shape and size, as reflected in Fig. 4a, b. The binding site is located in the middle of the protein helical bundle and near to the sodium binding site (see Supplementary Fig. 4 for details). In addition, the pocket includes I1213.40, F2826.44, and S3197.46, which play important roles in regulating the activity of β2AR23. Taken together, it can be expected that drugs targeting the binding site most probably modulate the β2AR signaling and function, implying high potential as an allosteric site. It is noted that the allosteric site does not open in active and inactive crystal structures of β2AR, as evidenced by Supplementary Fig. 5. More interestingly, the cryptic allosteric site has not been reported for other GPCRs31,32,33,34,35.Table 1 The allosteric site residues predicted the two representative conformations (Conf1 and Conf2) of the cluster1 categoryFig. 4: Allosteric site and drug screening strategy.a, b Allosteric site identified by a combination of FTMap and the LIME interpreter of RHML for two representative conformations (Conf1 (a) and Conf2 (b)). The predicted allosteric sites are shown as surface and important residues identified by the LIME interpreter are highlighted in blue. c Virtual screening workflow employed in the work and structures of four hit compounds screened. The binding energies between the receptor and the four ligands screened are highlighted in red, which is derived from the MM/GBSA calculations based on the last 10-ns trajectories of 120-ns short cMD simulations. d Binding modes of the four hit compounds with β2AR. Ligands are represented by stick (ZINC5042, salmon; ZINC252008995, cyan; ZINC4213962, wheat; ZINC11681534, skyblue). The receptor is represented as cartoon (yellow for Conf1 and green for Conf2). The polar interactions are shown as black dashed lines.Screening potential allosteric modulator by virtual screening and MM/GBSAVirtual screening has been successfully employed to identify allosteric modulators, including those targeting GPCRs36,37. As accepted, protein flexibility is crucial for structure-based drug design, and it was reported that multi-conformational virtual screening with two or three conformations of the target could improve the final enrichment and chemical diversity of the hit compounds38. As outlined above, the putative allosteric sites identified exhibit some differences in shape and size between Conf1 and Conf2 due to pocket flexibility. Therefore, we conducted the multi-conformational virtual screening based on the two conformations, as depicted in Fig. 4c. The ligand set is composed of two datasets (Diverse-lib and Drugs-lib) obtained from MTiOpenScreen39. Diverse-lib consists of 99,288 chemically diverse molecules suitable for screening novel drug scaffolds. Drugs-lib contains 4574 purchasable approved drug molecules, which facilitates drug repurposing with the advantages of reduced time and costs in drug development. In total, the ligand set contains 103,862 ligand molecules after a series of operations, for example, removing redundancy, evaluating drug-likeness, filtering toxic groups, and analyzing chemical diversity. Then, a preliminary screening was performed using the MTiOpenScreen platform. The top 3000 molecules from each conformation underwent further docking evaluations using Autodock 4.2. The docking score (AD4.2 energy) was used for ranking, along with visual inspection (see Supplementary Information for details) to exclude potential high-ranked false positives. Finally, four hit compounds (Fig. 4c) were selected from the top 20 compounds. Figure 4d shows predicted binding modes for the four ligands with β2AR. Interestingly, the compound ZINC5042 from Drugs-lib exhibits good scores both in the two conformations, as evidenced by Supplementary Table 2. As the MD simulation combined with the MM/GBSA calculation can improve the binding affinity prediction of poses obtained from docking protocols40, we selected the conformation with the best docking score for each of the four ligands to perform a short 120-ns cMD simulation and used the last 10 ns of each MD trajectory to conduct MM/GBSA binding free energy calculation. As shown in Fig. 4c, the MM/GBSA binding energies indicate that the four hit compounds all can stably bind to β2AR. Practically, ZINC11681534, with the weakest binding energy (− 33.35 kcal/mol), was already reported to be a β2AR antagonist23, implying that the other three compounds with better binding affinity may have potential efficacies. Herein, we selected ZINC5042 with the highest affinity as a representation of the promising compounds to verify our design strategy.Interaction mode between the allosteric ligand ZINC5042 and the receptorTo more reliably estimate the interaction of ZINC5042 with the receptor, we extended the simulation time of the β2AR-ZINC5042 complex to three independent 1.5-μs cMD simulations (see Supplementary Fig. 6 for RMSD). Based on the last 100-ns equilibrium trajectory, we calculated their binding free energies by using MM/GBSA and decomposed the energy into the corresponding residues. As reflected by Fig. 5a, the hotspot residues to the ZINC5042 binding distribute in TM2, TM3, TM6, and TM7 of β2AR. Figure 5b illustrates detailed interactions between ZINC5042 and the hotspot residues. It can be seen that ZINC5042 is bound deep within the transmembrane helix bundle mainly through polar and van der Waals interactions. Notably, the residue D792.50 contributes significantly to the ligand binding by forming an essential salt bridge with the polar head of ZINC5042. Another hotspot residue, S3197.46, forms hydrogen bonding with the alcohol hydroxyl group of the ligand’s polar head, further stabilizing the ligand-receptor complex. The chrysene moiety of ZINC5042 occupies a hydrophobic pocket composed of several hydrophobic hotspot residues (L752.46, L1243.43, I2786.40, F2826.44, N3227.49, and P3237.50), forming extensive van der Waals interactions, in turn contributing to the overall stability of the complex. It is noted that some functional residues of the allosteric pocket revealed above, such as D792.50, S3197.46, F2826.44, N3227.49, and P3237.50, devote important contribution to binding ZINC5042, implying a functional potential of ZINC5042.Fig. 5: Hotspot residues for the interaction of β2AR with the ZINC5042 ligand in the β2AR-ZINC5042 system and binding energies of β2AR with three ligands in five systems.The binding free energies are calculated by the MM/GBSA method based on the last 100-ns trajectory of three independent 1.5-μs cMD simulations, which are presented by mean ± standard error of the mean (SEM). a Important residues contributing to the ZINC5042 binding with the absolute value of binding energy greater than 0.5 kcal/mol in all three independent trajectories. Error bars represent SEM. b Overall structure of β2AR (blue) bound by ZINC5042 (green). The inset illustrates the detailed interactions between ZINC5042 (green stick) and the hotspot residues (blue sticks). Polar interactions are highlighted as black dotted lines. c Binding free energies of the two orthosteric agonists (NE and ALE) and the allosteric modulator ZINC5042 to β2AR in different systems.The allosteric modulator weakens the binding of orthosteric agonistIn order to estimate the allosteric potency of ZINC5042, we first examined its impact on the orthosteric ligand and the receptor. To this aim, two endogenous agonists of β2AR with to some extent differences in signaling, i.e., NE and L-epinephrine (ALE)41, were considered in the work in order to provide more sufficient evidence. Three independent 1.5-μs cMD simulations (See Supplementary methods for simulation details) were carried out for each of the five complex systems, including β2AR-ZINC5042, β2AR-NE, β2AR-ALE, β2AR-NE-ZINC5042, and β2AR-ALE-ZINC5042. RMSD values show that the five systems reach equilibrium, as reflected by Supplementary Fig. 6. MM/GBSA was used to calculate their ligand-receptor binding free energies based on the last 100 ns trajectory. As shown in Fig. 5c, without binding the allosteric modulator ZINC5042, the binding free energies between the orthosteric ligands and the receptor are − 21.34 ± 2.19 kcal/mol for NE and − 31.73 ± 2.77 kcal/mol for ALE. The agonist ALE exhibits a higher affinity than NE, consistent with the experimental results of the inhibitory constant (Ki)41. However, after the allosteric modulator ZINC5042 is bound, the binding energies between the two agonists and the receptor are weakened to − 14.58 ± 3.65 kcal/mol for NE and − 13.07 ± 0.85 kcal/mol for ALE. These results clearly indicate that the allosteric modulator ZINC5042 significantly reduces the affinity of the orthosteric agonists to the receptor. Besides, it is found that the two agonists also weaken the binding of ZINC5042 to the receptor, as evidenced by a comparison between β2AR-ZINC5042 (− 56.46 ± 1.68 kcal/mol), β2AR-NE-ZINC5042 (− 48.17 ± 4.02 kcal/mol) and β2AR-ALE-ZINC5042 (− 50.18 ± 3.06 kcal/mol) in Fig. 5c. The observation clearly reveals negative cooperativity of the binding energy between the orthosteric agonist and the allosteric modulator, implying a negative allosteric potency of ZINC5042.The allosteric modulator drives the receptor to the inactive conformationTo further estimate the effect of ZINC5042 on the activity of β2AR, we compared the structural differences upon binding ZINC5042 by superposing β2AR-ALE-ZINC5042 and β2AR-NE-ZINC5042 with the inactive and active crystal structures of β2AR. Since the structures are similar between the two systems (β2AR-ALE-ZINC5042 and β2AR-NE-ZINC5042), we only presented the superposition result of β2AR-ALE-ZINC5042 with the two crystal structures in the text, while the structural superposition of β2AR-NE-ZINC5042 was provided in Supplementary Fig. 7. As reflected by Fig. 6a, most regions of the receptor in the β2AR-ALE-ZINC5042 complex system resemble those of the inactive receptor, in particular for the activation region at the intracellular side. In the active state of β2AR, the intracellular ends of TM5 and TM6 typically exhibit outward movement, creating an open intracellular cavity for downstream protein binding. However, in the β2AR-ALE-ZINC5042 complex, the intracellular ends of TM5 and TM6 remain closed, exhibiting an inactive-like conformation that occludes downstream protein coupling (Supplementary Fig. 8).Fig. 6: The β2AR structural comparison between β2AR-ALE-ZINC5042 (blue), the crystal structures of inactive (PDB ID: 2RH1, light yellow) and active states (PDB ID: 3SN6, salmon).The blue arrows display the movement direction of the β2AR-ALE-ZINC5042 system. a The comparison of the overall structure for the three systems. For clarity, components except receptors are omitted. b Structural comparison of the microswitches in the sodium ion binding site (D2.50-S3.39-S7.46) and the PIF motif (P5.50-I3.40-F6.44) in the three different systems. c Comparison of four important residues of the allosteric site identified in the three different systems, in which ZINC5042 is represented by green spheres. d Distributions (histograms) of the distance (d1) of the OG atoms between S3.39 and S7.46 during the three 1.5-μs simulations for the β2AR-ALE-ZINC5042 system (blue). The yellow and salmon dashed lines represent the distance (d1) in the inactive and active crystal structures, respectively. e Distributions (histograms) of the distance (d2) of the Cα atoms between P5.50 and F6.44 during the three 1.5-μs simulations for the β2AR-ALE-ZINC5042 system (green). The yellow and salmon dashed lines denote the distance (d2) in the inactive (PDB ID: 2RH1) and active (PDB ID: 3SN6) crystal structures, respectively.As accepted, GPCR activation is an allosteric process initiated by perturbations in the extracellular binding pocket, and transmitted to the intracellular region for the downstream protein binding through activating molecular switches. These molecular switches include residues of the PIF motif (P2115.50-I1213.40-F2826.44) and three residues (D792.50-S1203.39-S3197.46) of the sodium ion binding site42, which are located in the middle of the transmembrane helix bundle and close to the allosteric pocket. As shown in Fig. 6b, the PIF motif in the active crystal structure undergoes rearrangement with respect to the inactive crystal structure. The rearrangement facilitates the outward movement of the cytoplasmic end of TM6, which has been considered to be necessary for the GPCR activation43. In addition, D792.50, S1203.39and S3197.46 in the active crystal structure move closer to each other than the inactive crystal structure, leading to the disruption of the sodium ion pocket. Consequently, the inward movement of TM7 is promoted, which is another characteristic of the GPCR activation43.In contrast to the activation features above, the β2AR-ALE-ZINC5042 structure exhibits outward movement of the PIF motif residues (P2115.50, I1213.40, F2826.44) and two sodium ion binding pocket residues (S1203.39, S3197.46), as evidenced by Fig. 6b. The outward movement mainly results from the steric hindrance between ZINC5042 and the four residues of the allosteric site (D792.50, S1203.39, F2826.44 and S3197.46), as reflected by Fig. 6c. To further observe conformational change in the sodium ion binding pocket induced by the ZINC5042 binding, we compared the distance between S1203.39 and S3197.46 (labeled as d1) of the sodium ion binding pocket with those of the active and inactive crystal structures, as shown in Fig. 6d. It can be seen that the collapse of the sodium ion pocket upon activation causes d1 to decrease from 6.3 Å in the inactive state to 4.5 Å in the active state. In the β2AR-ALE-ZINC5042 system, d1 is always greater than 4.5 Å, indicating that the ZINC5042 binding inhibits the collapse of the sodium ion pocket. Similarly, the distance between P2115.50 and F2826.44 (labeled as d2) of the PIF motif is used to characterize the conformation of the PIF motif (Fig. 6e). Upon activation, the inward movement of P2115.50 causes d2 to decrease from 11.1 Å in the inactive crystal structure to 9.8 Å in the active crystal. In the β2AR-ALE-ZINC5042 system, d2 is always greater than 9.8 Å, indicating that ZINC5042 binding inhibits the conformational rearrangement of the PIF motif induced by the agonist, thereby limiting the outward movement of the TM6 intracellular segment. Collectively, the binding of ZINC5042 to the receptor inhibits the transition of the receptor’s conformation to the active state, further suggesting the potential of ZINC5042 as a negative allosteric modulator (NAM).Allosteric regulation mechanism revealed by protein structure network (PSN)To gain insights into how the allosteric modulator regulates the orthosteric agonists, we employed PSN to calculate the shortest pathway with the highest frequency between the allosteric site and the orthosteric site for the β2AR-NE-ZINC5042 and β2AR-ALE-ZINC5042 systems. The shortest pathway is usually considered to be the most likely or biologically relevant pathway44. The residues of the allosteric site and the orthosteric site are shown in Supplementary Table 3. As reflected by Fig. 7a, the shortest pathways are F2826.44-W2866.48-F2906.52 for β2AR-NE-ZINC5042 and F2826.44-L2125.51-W2866.48-F2906.52 for β2AR-ALE-ZINC5042, suggesting the importance of these residues in the allosteric regulation. Both the two pathways include W2866.48 and F2826.44, which belong to the CWxP and PIF motif, respectively. The two regions are conserved motifs of the class A GPCRs45, which have been reported to play important roles in the GPCR activation. Their attendance in the shortest pathway implies that the allosteric regulation should influence the receptor activation.Fig. 7: Allosteric regulation pathways derived from PSN.Blue-shaded area: the orthosteric site. Green-shaded area: the allosteric site. Pink-shaded area: the intracellular activation region. a The shortest pathways with the highest frequency from the allosteric site to the orthosteric site in the β2AR-NE-ZINC5042 and β2AR-ALE-ZINC5042 systems. b The shortest pathway with the highest frequency from the orthosteric site to the intracellular activation region. PSN was calculated based on the last 1000-ns trajectory of three independent 1.5-μs cMD simulations for each system.To further understand how the allosteric modulator influences the receptor activation induced by the orthosteric agonists, we analyzed the shortest pathways between the orthosteric site and the intracellular activation region for the four systems, β2AR-NE, β2AR-ALE, β2AR-NE-ZINC5042 and β2AR-ALE-ZINC5042 (Fig. 7b), in which we selected residues of the orthosteric site and intracellular activation region as starting and ending nodes (see Supplementary Table 3 for details), respectively. Without binding the allosteric modulator, the shortest pathways are D1133.32-V862.57-M822.53-S3197.46-D792.50-N3227.49-Y3267.53 for β2AR-NE and F2906.52-W2866.48-F2826.44-I1213.40-M2155.54-M2796.41-Y2195.58 for β2AR-ALE. The two pathways include residues of the sodium ion binding site (S3197.46, D792.50) and the PIF motif (F2826.44, I1213.40), respectively, through which the agonist regulates the receptor activation.However, after binding ZINC5042, the two ternary complex (β2AR-NE-ZINC5042 and β2AR-ALE-ZINC5042) systems exhibit the same shortest pathway (F2906.52-W2866.48-F2826.44-M2796.41-Y2195.58), significantly different from the systems without binding ZINC5042. As reflected by the residues in the shortest pathway, the regulation from the orthosteric site to the activation region in the two ternary complex systems mainly rely on intra-helical structural communication of TM6 and only contain one inter-helical structural communication (M2796.41-Y2195.58) at the end of the pathway. In contrast, for the system only binding the orthosteric agonists, the shortest pathways exhibit extensive inter-helical structure communication between TM2, TM3 and TM7 in β2AR-NE and between TM3, TM5 and TM6 in β2AR-ALE, as evidenced by Fig. 7b. Inter-helical communication beneficial to the conformational changes are considered crucial for the receptor activation46, while the intra-helical interactions often stabilize the existing conformations42.Taken together, it can be assumed that the allosteric ligand binding would decrease the inter-helical structure communication, thus disfavoring the activation signaling stimulated by the agonist. Moreover, it is noted that the important residue F2826 .44 of the allosteric pocket participates in both the regulation pathway from the allosteric site to the orthosteric site and that from the extracellular orthosteric site to the intracellular activation region, highlighting the importance of F2826.44 for the allosteric signaling of ZINC5042.Experimental validation on the pharmacological property of ZINC5042 and the allosteric siteTo confirm the pharmacological property of ZINC5042, we first measured the efficacy of ZINC5042 alone for β2AR activation by using a cell-based function assay. Our result indicates that ZINC5042 fails to activate β2AR (Fig. 8a). Next, we investigated the allosteric effect of ZINC5042 on the two orthosteric agonists induced G-protein signaling for β2AR by the Glosensor-based cAMP assay. It is observed that ZINC5042 antagonizes both agonists NE and ALE-induced cAMP accumulation, as evidenced by a gradual decrease of the orthosteric agonists NE and ALE-induced receptor activation with increasing ZINC5042 concentration in a dose-dependent manner (Fig. 8b, c). These observations clearly confirm that ZINC5042 displays negative cooperativity with NE (log αβ = −0.82; αβ = 0.15) as well as ALE (log αβ = −1.15; αβ = 0.07) in a cell-based assay, strongly supporting our computational results. β2AR was reported to be engaged in the Gs signaling pathway and recruitment of β-arrestin. Given that biased allosteric modulators that exert pathway-specific effects have given rise to new frontiers in GPCR drug discovery47, we further conducted the NanoBiT β-arrestin recruitment assay to test whether ZINC5042 would influence the β-arrestin recruitment. Firstly, our results show that ZINC5042 cannot activate β2AR mediated the recruitment ability of β-arrestin alone, compared with NE (Fig. 8d). Interestingly, ZINC5042 exhibits the ability inhibiting NE-induced β-arrestin2 recruitment via dose-dependently manner (Fig. 8e), behaving as a negative allosteric modulator (NAM) of β2AR on β-arrestin signaling. To evaluate the biased character of ZINC5042 on the β2AR signaling, we further explored the favorable signaling of ZINC5042 by NanoBiT β-arrestin recruitment and Glosensor cAMP accumulation assays. Compared with NE alone, the β-arrestin2 recruitment ability of β2AR is reduced to approximately 54% by additional 80 μM ZINC5042 (Fig. 8f). In contrast, G protein activation is reduced to approximately 10% (Fig. 8f). These findings suggest that the efficacy of Gs activation is more significantly attenuated than the β-arrestin recruitment in the presence of NE and ZINC5042, indicating that ZINC5042 acts as a G protein-biased NAM of β2AR. Similar to other β2AR negative allosteric modulators reported20,48,49, ZINC5042 also presents allosteric activity at micromolar concentrations. However, the target selectivity indicates (see Supplementary Fig. 9 for details) that ZINC5042 is highly selective for β2AR in its pharmacological function. In addition, we also tested the cell-based functional assays for the other three hit compounds (ZINC11681543, ZINC4213962, and ZINC252008995) screened. As evidenced by Supplementary Fig. 10, the three hit compounds all exhibit negative allosteric modulator (NAM) effects, also supporting the potential of our screening strategy in practical application.Fig. 8: Experimental validation for the ZINC5042’s potency and the allosteric site.a Representative curve for concentration-dependent activation of β2AR in response to NE or ZINC5042 stimulation examined by Glosensor-based cAMP assay. ZINC5042 fails to activate β2AR. Data are presented as the mean ± standard error of the mean (SEM) of three independent experiments performed in triplicate. Error bars represent SEM. b, c Negative allosteric effect of ZINC5042 on NE-induced cAMP accumulation (b) and on ALE-induced cAMP accumulation mediated by β2AR (c). Data are presented as mean ± SEM from three independent experiments (n = 3) performed in triplicates. d, e Dose-response curves of β2AR in response to stimulation with different ligands by β-arrestin2 recruitment assay. Values are mean ± SEM from three independent experiments (n = 3) performed in triplicates. f The activation efficacy of β2AR in response to NE in the presence of 80 μM ZINC5042. Data are presented as mean ± SEM from three independent experiments (n = 3) performed in triplicates. p-values were obtained by Dunnett’s multiple comparison test. Gs protein: Norepinephrine vs. Gs protein: Norepinephrine+80 μM ZINC5042, ***p < 0.001; β-arr2: Norepinephrine vs. β-arr2: Norepinephrine + 80 μM ZINC5042, ***p < 0.001; Gs protein: Norepinephrine+80 μM ZINC5042 vs. β-arr2: Norepinephrine + 80 μM ZINC5042, ###p < 0.001. g The allosteric effect of ZINC5042 on β2AR WT and mutations. Bars represent differences in each mutation relative to WT for β2AR after calculating the ratio of the maximum effect efficacy (Emax) of NE in the presence and absence of 80 μM ZINC5042. Data are presented as mean ± SEM from three independent experiments (n = 3) performed in triplicates. All data were obtained by one-way analysis of variance with Dunnett’s multiple comparison test to determine significance (compared with WT, from left to right, ***p < 0.001, < 0.001, <0.001 < 0.001). h Schematic diagram illustrating the interactions between key allosteric residues (blue sticks) and ZINC5042 (green sticks) derived from the computational analysis of binding mode. ZINC5042 forms a salt bridge with D792.50, hydrogen bonding with the N3187.45 and S3197.46 side chains, and the π–π stacking interactions with F2826.44. Polar interactions are highlighted as black dotted lines, and π-π stacking is represented by magenta dotted lines.To validate the allosteric site of ZINC5042 in β2AR predicted by our computational framework, we performed site-directed mutagenesis studies and cell-based function assays in the presence of the orthosteric agonist NE, with and without ZINC5042. The result indicates that most of the mutations on the residues of the predicted allosteric binding site reduce the role of ZINC5042 in inhibiting NE-induced responses. Specifically, the Glosensor-based cAMP assay results show that D792.50A, F2826.44A, N3187.45A, and S3197.46A significantly reduce the antagonistic effect of ZINC5042 (Fig. 8g) while these residues are almost identified as the key residues for binding ZINC5042 by the binding energy analysis above. For example, D792.50 is revealed to contribute significantly to the ZINC5042 binding by an essential salt bridge (Fig. 8h). S3197.46, as a hotspot residue, forms hydrogen bonding with the alcohol hydroxyl group of the ZINC5042’s polar head. It can be seen from Supplementary Fig. 11 that S3197.46 forms hydrogen bonding with ZINC5042 more frequently than N3187.45 in the three independent simulations, which rationalizes the observation that S3197.46 is identified as the hotspot residue for ZINC5042 binding in the MM/GBSA analysis above, rather than N3187.45. It should be due to the rotation of the hydroxyl group of ZINC5042, leading to the situation that ZINC5042 alternately forms hydrogen bonding with S3197.46 or the adjacent N3187.45 (Fig. 8h), indicating that N3187.45 also plays a role in stabilizing ZINC5042 despite of a shorter duration of hydrogen bonding than S3197.46. The computational result rationalizes the experimental observation that the mutations on S3197.46 and N3187.45 all give rise to the drop in the antagonistic effect of ZINC5042, with a greater decrease by the S3197.46A mutation than the N3187.45A (Fig. 8g).For F2826.44, the binding mode analysis above already reveals that it interacts with chrysene moiety via π–π stacking interactions (also vide Fig. 8h). Moreover, F2826.44 is revealed above to serve as a regulatory residue in the two allosteric pathways (i.e., one from the allosteric site to the orthosteric site and one from the orthosteric site to the intracellular activation domain). Thus, the mutation F2826.44A exerts the most pronounced antagonistic effect on the downstream signaling, which also supports the allosteric regulation mechanism revealed above. Collectively, the pharmacological and site-directed mutagenesis experiments are completely in line with our computations, strongly validating the reliability of our computational framework for the allosteric site, allosteric effect, and allosteric mechanism.

Integrative residue-intuitive machine learning and MD Approach to Unveil Allosteric Site and Mechanism for β2AR

Who cares about lab rodents? | Science

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

KRAS takes the road to destruction | Science

Psychedelic research at a crossroads | Science

Hot Topics

Who cares about lab rodents? | Science

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Who cares about lab rodents? | Science

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

KRAS takes the road to destruction | Science

Popular Articles

Who cares about lab rodents? | Science

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science