CryoTRANS: predicting high-resolution maps of rare conformations from self-supervised trajectories in cryo-EM

Generation of of simulated datasetsSimulated density maps with varying resolutions were generated, and morphing from the initial map to the simulated density maps was performed using CryoTRANS. Four biological macromolecules were selected for generating simulated datasets, each containing two conformations resolved by experimental approaches. These macromolecules include the heterodimeric ABC exporter TmrAB (PDB 6RAH and PDB 6RAI), the peptidase-containing ABC transporter 1 (PDB 7T55 and PDB 7T57), thermostabilized human prestin (PDB 7V73 and PDB 7V75), and Mm-cpn (PDB 3IYF and PDB 3J03). The corresponding atomic model of the simulated dataset was converted to the density map using the molmap function of Chimera. The density map of one conformation was low-passed to 3 Å, serving as the initial high-quality map for morphing using CryoTRANS. The density map of the other conformation was treated as the target density map and was low-passed to different resolutions of 3, 5, and 7 Å, respectively.Mathematical principles of CryoTRANSBasic formulation of CryoTRANS self-supervised trainingCryoTRANS trains a flow map, $\vec{\Phi }\left(\vec{x},t\right)\in {{\mathbb{R}}}^{3},\ t\in [0,1]$, in a self-supervised manner between the high-resolution density map f0 and the low-resolution density map f1. The high-resolution prediction of f1 is given by ${f}_{0}({\vec{\Phi }}^{-1}(\cdot ,1))$. This flow map, $\vec{\Phi }(\vec{x},t)$, is determined by an ODE model:$$\frac{{{{\rm{d}}}}\vec{\Phi }(\vec{x},t)}{{{{\rm{d}}}}t} = \vec{V}\left(\vec{\Phi }(\vec{x},t)\right), t\in [0,1]\\ \vec{\Phi }(\vec{x},0) = \vec{x}.$$
(1)
The velocity field $\vec{V}$ is formulated as a multi-layer perceptron (MLP) with two hidden layers:$$\vec{V}(\vec{x};\theta )={W}_{3}(\sigma ({W}_{2}(\sigma ({W}_{1}\vec{x}+{b}_{1}))+{b}_{2}))+{b}_{3},$$
(2)
where σ is the activation function, and ${W}_{1}\in {{\mathbb{R}}}^{n\times 3}$, ${b}_{1}\in {{\mathbb{R}}}^{n}$, ${W}_{2}\in {{\mathbb{R}}}^{n\times n}$, ${b}_{2}\in {{\mathbb{R}}}^{n}$, ${W}_{3}\in {{\mathbb{R}}}^{3\times n}$, and ${b}_{3}\in {{\mathbb{R}}}^{3}$ are learnable parameters. Here, θ denotes all the learnable parameters. In this study, the width of the hidden layer, n, is set to 100 by default, and σ is chosen as the leaky ReLU function with a slope of 0.01.To learn the velocity field, the loss is defined as the squared Wasserstein distance between ${f}_{0}({\vec{\Phi }}^{-1}(\cdot ,1))$ and f1. The high-resolution density map is obtained by solving the following optimization model:$${\min }_{\theta }{{{{\mathcal{W}}}}}_{2}{\left({\hat{f}}_{1},{f}_{1}\right)}^{2}\quad \,\,\,\,\\ \,{\mbox{subject \, to}}\,\quad \frac{{{{\rm{d}}}}\vec{\Phi }\left(\vec{x},t\right)}{{{{\rm{d}}}}t} = \, \vec{V}\left(\vec{\Phi }\left(\vec{x},t\right);\theta \right),\quad t\in [0,1]\\ \vec{\Phi }\left(\vec{x},0\right) = \,\, \vec{x},\\ {\hat{f}}_{1}\left(\vec{x}\right) = \,\, {f}_{0}\left({\vec{\Phi }}^{-1}\left(\vec{x},1\right)\right).$$
(3)
Here, ${{{{\mathcal{W}}}}}_{2}$ is the squared Wasserstein distance49. After solving this optimization problem, the velocity field $\vec{V}$ and the corresponding flow $\vec{\Phi }\left(\vec{x},t\right)$ can be obtained by solving Eq. (1).Discretization of the neural ODEThe optimization of the neural ODE, as described in Eq. (1), requires discretization before it can be computed. Without loss of generality, the computational domain is chosen as [0, 1]3 and is discretized using uniform grids:$${\vec{x}}_{ijk}=\left(\frac{i}{N},\frac{j}{N},\frac{k}{N}\right),\quad i,j,k\in \left\{0,1,\cdots \,,N-1\right\}.$$
(4)
The ODE (1) is discretized using the forward Euler method:$$\vec{\Phi }\left({\vec{x}}_{ijk},{t}_{m+1}\right)= \,\, \vec{\Phi }\left({\vec{x}}_{ijk},{t}_{m}\right)+\tau \vec{V}\left(\vec{\Phi }\left({\vec{x}}_{ijk},{t}_{m}\right);\theta \right),\\ \vec{\Phi }\left({\vec{x}}_{ijk},0\right)= \,\, {\vec{x}}_{ijk},$$
(5)
where ${t}_{m}=\frac{m}{M},\ m\in \left\{0,1,\cdots \,,M-1\right\}$ and $\tau =\frac{1}{M}$. A typical choice for M is 10. Then ${\hat{f}}_{1}$ can be obtained as follows:$${\hat{f}}_{1}\left(\vec{\Phi }\left({\vec{x}}_{ijk},1\right)\right)={f}_{0}\left({\vec{x}}_{ijk}\right).$$
(6)
In the computation, the target density map f1 is evaluated on the uniform grids ${\vec{x}}_{ijk}$. To compute ${{{{\mathcal{W}}}}}_{2}({\hat{f}}_{1},{f}_{1})$, ${\hat{f}}_{1}$ also needs to be evaluated over ${\vec{x}}_{ijk}$. This is done using bilinear interpolation (see Supplementary Note VII).Implementation of CryoTRANS trainingThe Wasserstein distance ${{{{\mathcal{W}}}}}_{2}({\hat{f}}_{1},{f}_{1})$ is computed using the Sinkhorn algorithm50, which is an efficient, GPU-friendly algorithm that employs entropic regularization and allows for efficient gradient backpropagation (detailed in Supplementary Note VIII). To alleviate computational overhead, the first 5 iterations of the Sinkhorn algorithm are performed in each round, and the penalty coefficient of the entropic regularization is set to 0.0001.The gradient is computed based on automatic gradient propagation in PyTorch (see Supplementary Note VII). The weights of the neural network are initialized following a normal distribution with a mean of zero and a variance of 0.01. The model is trained using the ADAM optimizer with a learning rate of 0.001. A multi-scale algorithm is also introduced during the training process, achieving acceleration by a factor of 5 to 10 across different datasets (Supplementary Note IX).Pseudo-trajectory inference after CryoTRANS trainingOnce a neural velocity field is trained with optimal parameters θ*, (5) is further iterated with θ* to obtain an optimal moving trajectory for each voxel ${\vec{x}}_{ijk}$. All the trajectories obtained in this process are gathered and bilinear interpolation is utilized to generate the final trajectory of the density map:$${\hat{f}}_{{t}_{m}}={f}_{0}\left({\vec{\Phi }}^{-1}(\cdot ,{t}_{m})\right),\quad m=0,1,\cdots \,,M.$$
(7)
The density map ${\hat{f}}_{1}$ corresponds to the generated density map at the end of the trajectory.Density maps partition strategy with rigidity constraints during CryoTRANS trainingThe procedure described above is utilized for training simulated data. However, when training CryoTRANS with real data (A2M, Arp2/3, SARS-CoV-2), where different domains follow different velocity fields, a modified approach is employed. This approach utilizes two velocity fields with rigidity constraints based on a manual partition of the density map, enhancing the model’s representation capability.Specifically, the whole volume is manually divided into two disjoint parts,$$\Omega ={[0,1]}^{3}={\Omega }_{a}\sqcup {\Omega }_{b},$$
(8)
and f0 is consequently divided into ${f}_{0}^{a}$ and ${f}_{0}^{b}$, where$${f}_{0}^{a}\left(\vec{x}\right)=\left\{\begin{array}{l}{f}_{0}\left(\vec{x}\right),\,\, {\mbox{if}}\,\vec{x}\in {\Omega }_{a} \\ 0,\,\,\,\,\, \quad {\mbox{if}}\,\vec{x}\in {\Omega }_{b}\hfill \end{array}\right.,\quad {f}_{0}^{b}\left(\vec{x}\right)=\left\{\begin{array}{l}0,\,\,\,\, \quad {\mbox{if}}\,\vec{x}\in {\Omega }_{a} \hfill \\ {f}_{0}\left(\vec{x}\right),\,{\mbox{if}}\,\vec{x}\in {\Omega }_{b}\quad \end{array}\right..$$
(9)
Subsequently, two separate neural velocity fields, characterized by different parameters θ a and θ b, are employed to generate trajectories on these individual partitions. The trajectory for each partition can be described as follows:$${\vec{\Phi }}^{a}\left({\vec{x}}_{ijk},{t}_{m+1}\right) = \,\, {\vec{\Phi }}^{a}\left({\vec{x}}_{ijk},{t}_{m}\right)\\ +\tau \vec{V}\left({\vec{\Phi }}^{a}\left({\vec{x}}_{ijk},{t}_{m}\right);{\theta }^{a}\right), {\vec{\Phi }}^{a}\left({\vec{x}}_{ijk},0\right) ={\vec{x}}_{ijk};$$
(10)
$${\vec{\Phi }}^{b}\left({\vec{x}}_{ijk},{t}_{m+1}\right)= \,\, {\vec{\Phi }}^{b}\left({\vec{x}}_{ijk},{t}_{m}\right)\\ + \tau \vec{V}\left({\vec{\Phi }}^{b}\left({\vec{x}}_{ijk},{t}_{m}\right);{\theta }^{b}\right),{\vec{\Phi }}^{b}\left({\vec{x}}_{ijk},0\right)={\vec{x}}_{ijk}.$$
(11)
The generated maps for each partition, ${\hat{f}}_{1}^{a}$ and ${\hat{f}}_{1}^{b}$, are obtained as follows:$${\hat{f}}_{1}^{a}\left({\vec{\Phi }}^{a}\left({\vec{x}}_{ijk},1\right)\right)={f}_{0}^{a}\left({\vec{x}}_{ijk}\right),\quad {\hat{f}}_{1}^{b}\left({\vec{\Phi }}^{b}\left({\vec{x}}_{ijk},1\right)\right)={f}_{0}^{b}\left({\vec{x}}_{ijk}\right).$$
(12)
For employing rigidity constraints, the loss function is modified to consist of two components: the squared Wasserstein distance between the combined deformed map ${\hat{f}}_{1}^{a}+{\hat{f}}_{1}^{b}$ and the target density map f1, and the local rigidity constraint Lrigid( ⋅ ) on the two velocity fields20 (Supplementary Note X). Consequently, the total loss of the model is$${{{{\mathcal{W}}}}}_{2}{\left({\hat{f}}_{1}^{a}+{\hat{f}}_{1}^{b},{f}_{1}\right)}^{2}+{\lambda }^{a}{L}_{{{{\rm{rigid}}}}}\left(\vec{V}\left({\theta }^{a}\right)\right)+{\lambda }^{b}{L}_{{{{\rm{rigid}}}}}\left(\vec{V}\left({\theta }^{b}\right)\right).$$
(13)
Here, λa and λb are hyperparameters of the model that control the rigidity degree of the velocity field. The selection of λa and λb, as well as the partitioning process, are described in detail in Supplementary Note XI. The model is trained using ADAM optimization with the same parameters as before. After obtaining the optimal parameters θa* and θb*, the final optimal trajectory can be generated as follows:$$\left\{{f}_{0},{\hat{f}}_{{t}_{1}}^{a}+{\hat{f}}_{{t}_{1}}^{b},\ldots ,{\hat{f}}_{{t}_{M}}^{a}+{\hat{f}}_{{t}_{M}}^{b}\right\},$$
(14)
where$${\hat{f}}_{{t}_{m}}^{a}={f}_{0}\left({({\vec{\Phi }}^{a})}^{-1}\left(\cdot ,{t}_{m}\right)\right),\quad {\hat{f}}_{{t}_{m}}^{b}={f}_{0}\left({({\vec{\Phi }}^{b})}^{-1}\left(\cdot ,{t}_{m}\right)\right).$$
(15)
Metrics for density map qualityThe model-to-map FSC is used to measure the similarity between the CryoTRANS-generated density and the atomic model. The unmasked FSC curves are depicted in this study. This is because CryoTRANS requires density maps to be represented as probability distributions, necessitating that each voxel’s value be positive. During the preprocessing stage, any negative values in the density map are reset to zero. This process effectively acts as a mask applied to the density map itself.The Q-score of the CryoTRANS-generated density map is compared with the Q-score of the target map. To further evaluate the accuracy of CryoTRANS, map-to-model in Phenix is used to build new atomic models from both the generated map and the target map, respectively. Subsequently, the TM-score between the reconstructed model from Phenix and the corresponding reference model from the PDB bank is calculated, serving as an additional measure of structure accuracy. Chain-comparison in Phenix is also used to compute the backbone coverage, providing further insights into the accuracy of the generated structures.Statistics and reproducibilityThe model is described in detail through mathematical formulation, and both the Python code and raw data are uploaded. The experiments mentioned in the article can be reproduced using the provided code and data. The neural network of CryoTRANS employs random initialization, so the results of experiments will exhibit some randomness, which may cause fluctuations in the model-to-map FSC curves of the generated densities. The displayed results are based on the best outcomes from three repeated experiments.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

CryoTRANS: predicting high-resolution maps of rare conformations from self-supervised trajectories in cryo-EM

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis