Symmetry-adapted Markov state models of closing, opening, and desensitizing in α 7 nicotinic acetylcholine receptors

Initial pathway generationSix pathways were generated using Climber v. 1.036 by conducting forward and reverse interpolations between closed (PDB ID 7KOO), open (PDB ID 7KOX), and desensitized (PDB ID 7KOQ) states14. A total of 150 seeds were strategically chosen to provide a sparse sampling across the gating space. It is important to note that the initial pathways only need to adequately cover the necessary range, as we rely on unrestrained simulations for sampling and the MSMs to connect the states and modify the path.Molecular dynamics simulationsSimulation parameters are summarized in Supplementary Information. All seeds were inserted into 1-palmitoyl-2-oleoylphosphatidylcholine (POPC) lipid bilayers using MemProtMD62. They were then solvated with water (TIP3P63) and supplemented with 0.15 M NaCl to simulate physiological conditions. In the case of the cholesterol (CHOL) system, five cholesterol molecules were initially placed within the intersubunit binding sites for all seeds. Subsequently, seeds were embedded into bilayers composed of 33% cholesterol and 67% POPC, which mimics relevant physiological conditions.For the simulations, ~400 POPC lipids were utilized for the apo system, while a combination of 300 POPC lipids and 150 cholesterol molecules were employed for the CHOL system. The simulation boxes had dimensions of 12 × 12 × 18 nm3. To describe the system, the CHARMM36m forcefield (July 2020 version)64 was employed. Atomistic forcefields are often preferred over coarse-grained to capture detailed conformational dynamics; for this protocol, the accuracy of the lipid parameters has also been previously validated65.To ensure system stability, relaxation was performed in several steps. First, energy minimization was performed using the steepest descent algorithm for 5000 steps. Subsequently, a three-stage NPT relaxation was carried out, with each stage lasting 10 ns. Initially, restraints were placed on all heavy atoms, followed by only backbone atoms, and finally only on C-α atoms of the protein. The docked cholesterol molecules were also restrained during the first two rounds of relaxation to maintain their positions, but unrestrained in the final stage.To maintain the temperature at 300 K independently for the protein, lipid, and solvent components, the v-rescale thermostat66 was employed. Additionally, the c-rescale barostat67 was utilized to semi-isotropically maintain the pressure at 1 bar. Bond lengths were constrained using the LINCS68 algorithm.For production simulations, all restraints were removed, and systems simulated for 1–2 μs using GROMACS-202169, while employing the same thermostats and barostats as during equilibration. To enhance sampling along pathways between the closed and open states in the apo system, an additional 50 simulations were performed. These simulations were conducted along the pathway to improve conformational space coverage in less sampled regions. In total, the apo system underwent 200 μs of production simulations, while the CHOL system was simulated for 277 μs. These production simulations provided the necessary data for subsequent analyses, with data collection occurring at 1 ns intervals.Feature extractionFor the initial model from each seed, including the starting models, a contact map was constructed by considering the interactions between all Cα atoms. A distance cutoff of 1 nm was applied to determine if two Cα atoms were in contact.Contacts with a peak-to-peak value (range) versus mean ratio of less than 0.2 were deemed insignificant and removed from the analysis. This helps filter out contacts that do not exhibit substantial variation in their interaction. After filtering, a total of 1610 significant contacts remained. These contacts were organized into blocks to maintain invariance to the permutation of subunits (see next section). The block arrangement ensures that the order of contacts remains the same regardless of how the subunits are arranged. For example, the first block of contacts would encompass both intrasubunit contacts within subunit A and intersubunit contacts between subunits A and B. Reciprocal transformations were performed on all distances prior to featurization. This approach of using inverse distances makes the related tICAs less prone to domination by rare large fluctuations.Symmetry and augmentation in feature spaceThe α 7 nicotinic receptors are homopentameric proteins composed of five identical subunits arranged in C5 symmetry around the central pore. This arrangement gives rise to rotational invariance, allowing us to understand the protein dynamics as a combination of five coupled subsystems.For example, a simplified two-state (C, O) model can be used to describe the conformational states of the subsystems. The model can be represented as follows: (C, C, C, C, C) ⇌ (C, C, C, C, O) ⇌ (C, C, C, O, O) ⇌ (C, C, O, O, O) ⇌ (C, O, O, O, O) ⇌ (O, O, O, O, O). It is important to note that within this model, there exist various degenerate asymmetric states, including configurations like (C, C, C, C, O) and (C, C, C, O, C), which share identical energy levels. Additionally, special semi-degenerate states, for instance (C, C, C, O, O), are also present, which can manifest in two distinct arrangements: (C, C, C, O, O) and (C, C, O, C, O). In constructing the model’s features, we grouped such semi-degenerate states under a single category, although theoretically, they can be differentiated afterward based on their unique sequential subsystem combinations.Considering the feature space, the observables that describe the dynamics of the system exhibit equivariance with respect to the symmetry group Z5 〈r∣r5 = e〉. In other words, under the group actions, such as \({{{{\mathcal{F}}}}}_{perm}=\{\,f:{r}_{{p}\, {\circ}}\, f=f({x}_{A},{x}_{B},{x}_{C},{x}_{D},{x}_{E})=f({x}_{B},{x}_{C},{x}_{D},{x}_{E},{x}_{A})\}\), the Koopman propagation (\({{{\mathcal{K}}}}\)) of general observable functions (f) commutes46:$${r}_{{p}\,{\circ }}\,({{{\mathcal{K}}}}f)(x)={{{\mathcal{K}}}}({r}_{{p}\,{\circ }}\,f)(x)$$
(1)
The simulation sampling, thus, can be augmented by applying permutations to the original feature array from simulations when they are grouped based on the system’s symmetry, which effectively improves the simulation sampling fivefold. Note that features in this system are organized into five subsystem blocks, rather than subunit blocks. The first block encompasses contacts within subunit 1, as well as contacts between subunit 1 and other subunits (primarily subunit 2) taking care to avoid double counting of the contacts.Symmetry-adapted time-lagged independent component analysis (SymTICA)Time-lagged independent component analysis (TICA) is a dimensionality reduction technique utilized to extract the slowest dynamics present in a given feature space37. It operates on a sequence of time series data denoted as \({{{{\bf{X}}}}}_{{{{\bf{t}}}}}={\{{x}_{1},{x}_{2},…,{x}_{N}\}}_{t}\) and computes the mean-free covariance matrix C0 and the time-lagged covariance matrix Cτ given a specific lag time τ. Under the assumption of reversible dynamics in TICA, the symmetry of Cτ is enforced numerically during the estimation process. The symmetric Koopman matrix, which encodes the kinetic information of the system, is then obtained. This matrix is further decomposed into a spectrum of eigenvalues λ and corresponding eigenvectors ν. These eigenvalues and eigenvectors provide insights into the dominant slow modes or collective motions of the system, allowing for a reduced-dimensional representation of the dynamics.In the presence of symmetry within the dataset, the covariance matrix of the Koopman matrix exhibits a distinct block form. For instance, in the case of 5-fold symmetry, the covariance matrix (N × N) can be represented as:$$\left[\begin{array}{ccccc}{{{{\bf{C}}}}}_{d}&{{{{\bf{C}}}}}_{o1}&{{{{\bf{C}}}}}_{o2}&{{{{\bf{C}}}}}_{o2}^{\top }&{{{{\bf{C}}}}}_{o1}^{\top }\\ {{{{\bf{C}}}}}_{o1}^{\top }&{{{{\bf{C}}}}}_{d}&{{{{\bf{C}}}}}_{o1}&{{{{\bf{C}}}}}_{o2}&{{{{\bf{C}}}}}_{o2}^{\top }\\ {{{{\bf{C}}}}}_{o2}^{\top }&{{{{\bf{C}}}}}_{o1}^{\top }&{{{{\bf{C}}}}}_{d}&{{{{\bf{C}}}}}_{o1}&{{{{\bf{C}}}}}_{o2}\\ {{{{\bf{C}}}}}_{o2}&{{{{\bf{C}}}}}_{o2}^{\top }&{{{{\bf{C}}}}}_{o1}^{\top }&{{{{\bf{C}}}}}_{d}&{{{{\bf{C}}}}}_{o1}\\ {{{{\bf{C}}}}}_{o1}&{{{{\bf{C}}}}}_{o2}&{{{{\bf{C}}}}}_{o2}^{\top }&{{{{\bf{C}}}}}_{o1}^{\top }&{{{{\bf{C}}}}}_{d}\end{array}\right]$$
(2)
Here, Cd (N/5 × N/5) represents the diagonal block of the covariance matrix; Co1 (N/5 × N/5) represents the off-diagonal block of the covariance matrix between features in subsystem i and i+1; Co2 (N/5 × N/5) represents the off-diagonal block of the covariance matrix between features in subsystem i and i + 2.Similarly, the Koopman matrix (N × N) follows the same block structure, and it can be expressed as:$$\left[\begin{array}{ccccc}{{{{\bf{K}}}}}_{d}&{{{{\bf{K}}}}}_{o1}&{{{{\bf{K}}}}}_{o2}&{{{{\bf{K}}}}}_{o2}^{\top }&{{{{\bf{K}}}}}_{o1}^{\top }\\ {{{{\bf{K}}}}}_{o1}^{\top }&{{{{\bf{K}}}}}_{d}&{{{{\bf{K}}}}}_{o1}&{{{{\bf{K}}}}}_{o2}&{{{{\bf{K}}}}}_{o2}^{\top }\\ {{{{\bf{K}}}}}_{o2}^{\top }&{{{{\bf{K}}}}}_{o1}^{\top }&{{{{\bf{K}}}}}_{d}&{{{{\bf{K}}}}}_{o1}&{{{{\bf{K}}}}}_{o2}\\ {{{{\bf{K}}}}}_{o2}&{{{{\bf{K}}}}}_{o2}^{\top }&{{{{\bf{K}}}}}_{o1}^{\top }&{{{{\bf{K}}}}}_{d}&{{{{\bf{K}}}}}_{o1}\\ {{{{\bf{K}}}}}_{o1}&{{{{\bf{K}}}}}_{o2}&{{{{\bf{K}}}}}_{o2}^{\top }&{{{{\bf{K}}}}}_{o1}^{\top }&{{{{\bf{K}}}}}_{d}\end{array}\right]$$
(3)
As a result, when solving the eigenvalue problem of the Koopman matrix Kν = λν, the N-dimensional eigenvector ν can be decomposed as a concatenation of νA, νB, νC, νD, νE (N/5-dimensional). The first component of the equation becomes \({{{{\bf{K}}}}}_{d}\cdot {{{\nu }}}_{{{{\bf{A}}}}}+{{{{\bf{K}}}}}_{o1}\cdot {{{\nu }}}_{{{{\bf{B}}}}}+{{{{\bf{K}}}}}_{o2}\cdot {{{\nu }}}_{{{{\bf{C}}}}}+{{{{\bf{K}}}}}_{o2}^{\top }\cdot {{{\nu }}}_{{{{\bf{D}}}}}+{{{{\bf{K}}}}}_{o1}^{\top }\cdot {{{\nu }}}_{{{{\bf{E}}}}}=\lambda {{{\nu }}}_{{{{\bf{A}}}}}\).To enhance data efficiency and reduce estimation error in a large kinetic model (see the toy model system in Supplementary Information), we aim to understand the system dynamics under degeneracy. This involves seeking a symmetry-adapted mapping that remains invariant under the symmetry group. In this case, we have νA = νB = νC = νD = νE, and the eigenvalue problem can be simplified to KsumνA = λνA, where \({{{{\bf{K}}}}}_{{{{\bf{sum}}}}}={{{{\bf{K}}}}}_{d}+{{{{\bf{K}}}}}_{o1}+{{{{\bf{K}}}}}_{o2}+{{{{\bf{K}}}}}_{o2}^{\top }+{{{{\bf{K}}}}}_{o1}^{\top }\).The features xi can be projected onto the dominant eigenspace (independent components, ICs) by summing over all five subsystems (subICs):$${{{{\bf{x}}}}}_{{{{\bf{i}}}}}=\left[\begin{array}{c}{{{{\bf{x}}}}}_{{{{\bf{iA}}}}}\\ {{{{\bf{x}}}}}_{{{{\bf{iB}}}}}\\ {{{{\bf{x}}}}}_{{{{\bf{iC}}}}}\\ {{{{\bf{x}}}}}_{{{{\bf{iD}}}}}\\ {{{{\bf{x}}}}}_{{{{\bf{iE}}}}}\end{array}\right],\quad {{{\bf{subICs}}}}=\left[\begin{array}{c}{{{{{\bf{v}}}}}_{{{{\bf{A\cdot x}}}}}}_{{{{\bf{iA}}}}}\\ {{{{{\bf{v}}}}}_{{{{\bf{A\cdot x}}}}}}_{{{{\bf{iB}}}}}\\ {{{{{\bf{v}}}}}_{{{{\bf{A\cdot x}}}}}}_{{{{\bf{iC}}}}}\\ {{{{{\bf{v}}}}}_{{{{\bf{A\cdot x}}}}}}_{{{{\bf{iD}}}}}\\ {{{{{\bf{v}}}}}_{{{{\bf{A\cdot x}}}}}}_{{{{\bf{iE}}}}}\end{array}\right],\quad$$
(4)
$${{{\bf{ICs}}}}={{{\bf{v}}}_{{{\bf{A }}}} \cdot x}_{{{{\bf{iA}}}}}+{{{\bf{v}}}_{{{\bf{A }}}} \cdot x}_{{{{\bf{iB}}}}}+{{{\bf{v}}}_{{{\bf{A }}}} \cdot x}_{{{{\bf{iC}}}}}+{{{\bf{v}}}_{{{\bf{A }}}} \cdot x}_{{{{\bf{iD}}}}}+{{{\bf{v}}}_{{{\bf{A }}}} \cdot x}_{{{{\bf{iE}}}}}$$
(5)
Meanwhile, symmetry can be evaluated by analyzing the differences between each subICs, such as the standard deviation. Alternatively, symmetry-aware MDS70 can be employed. Here, the distance matrix between samples in each macrostate was calculated from the minimum Euclidean distances of picked subIC(s) across all five possible permutations. It ensures that similar asymmetric conformations are embedded similarly, regardless of which subsystem differs. The distance matrix is then decomposed into a low-dimensional representation using classical MDS. The symmetry-aware MDS was performed using the scikit-learn package71 and is available in the Zenodo  repository72.
Algorithm 1
Get distance between two samples for five-fold symmetric systems
Si ← (SiA, SiB, SiC, SiD, SiE)
Sj ← (SjA, SjB, SjC, SjD, SjE)
D ← inf
for i ← 1 to 5 do
   Sj ← roll(Sj, 1)
   \(D\leftarrow \min (D,\,{\mbox{Euclidean}}\,({S}_{i},{S}_{j}))\)
end for
return D
Markov state modelThe SymTICA analysis was conducted using a lag time of 50 ns, and the first two independent components (ICs) that separate the three starting states were selected. Other faster dimensions with a single Gaussian shape distribution in their sampling were excluded from further analysis73.After cross-validation based on the VAMP-2 score74, 1000 microstates were clustered using the k-means algorithm75 with k-means++76 initialization, implemented in the Deeptime package77. Bayesian MSMs78 were estimated from the discrete microstate trajectories with a lag time of 100 ns after the Markovian property was validated using the corresponding implied timescales38 ranging from 50 ns to 500 ns.Subsequently, a five-state coarse-grained MSM was constructed using the PCCA+ algorithm79,80, which partitions the microstates into kinetically distinct sets. The MSM was validated using the Chapman-Kolmogorov test38.All MSM analyses were performed using either the Deeptime77 or PyEMMA81 packages, and the results were visualized using seaborn82, Matplotlib83, or prettypyplot84.Relevant feature analysisThe contact features, as well as other geometric features, were computed using custom scripts available at the GitHub repository: https://github.com/yuxuanzhuang/msm_a7_nachrs; the scripts utilized MDAnalysis v. 2.8.085 and ENPMDA72.To calculate membrane thickness, the lipophilic package86 was employed. The thickness was determined by selecting phosphorus (P) atoms from POPC molecules and measuring the distance between the upper and lower leaflets of the membrane.Pore hydration around residue \({9}^{{\prime} }\) or \({2}^{{\prime} }\) was defined as the number of water molecules within a cylindrical region centered at the specific residue and spanned ± 2 Å along the pore axis. The tilts of the TMD helices were determined using the HELANAL algorithm87. The tilt angle between each helix and the membrane normal was calculated, providing information about the orientation of the helices relative to the membrane.All distributions plotted were reweighted by the stationary probability of each microstate.Cholesterol density calculation with symmetry-corrected alignmentTo evaluate the cholesterol density around the protein, conventionally, a single reference structure is used to first align all the trajectories based on the minimum root-mean-square deviation (RMSD) metric. The density is calculated with a grid-based approach. However, this approach is not suitable for asymmetric proteins from multiple trajectories because different trajectories may sample different asymmetric conformations but still belong to the same state. Therefore, we developed a symmetry-corrected alignment method to align the trajectories based on the symmetry of the protein. This method is an extension of the RMSD-based structure alignment algorithm implemented in MDAnalysis85,88,89 to obtain the minimum RMSD value for each permutation and the corresponding rotation matrix.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles