An isotopically-labelled temporal mass spectrometry imaging data analysis workflow to reveal glucose spatial metabolism patterns in bovine lens tissue

The data analysis pipeline consists of three main modules: spatial overlay, dimensionality reduction, and a combined metabolite annotation/pathway mapping step (Fig. 2). The resulting output consists of annotated m/z features and ion images that represent both unlabelled (endogenous) and SIL (exogenous) metabolites grouped by metabolic pathway. Each module utilises existing R scripts to curate the data for optimised and accurate annotation and is presented in turn.Figure 2Schematic diagram of the computational pipeline. (i) Lens coordination position alignment and rescaling; (ii) Lens-to-lens hexagonal binning; (iii) Data table reshaping; (iv) t-SNE and UMAP analysis; (v) Principle component analysis and top loading feature selection; (vi) K means segmentation; (vii) SIL metabolite database construction; (viii) 1st round metabolomic annotation and pathway enrichment; (ix) Feature scoring and FDR controlled filtering; (x) 2nd round metabolomic annotation and pathway enrichment within filtered features; (xi) Feature visualization (ion images and kinetic images).Module 1: Spatial alignment and data reshapingThe time-series iso-imaging experiment comprised three biological replicates for each of the ten incubation time points, totalling thirty sampled lens tissue sections. Despite using a consistent, stable supply of tissue, there was variability in the size and shape of both the incubated lenses and the analysed tissue sections. This necessitated a spatial alignment process to allow the subtly different shapes of lens sections to be overlaid for metabolism to then be monitored in spatially consistent tissue regions. Raw pixel data and coordinates could not be used for lens-to-lens alignment since the different sizes of lens sections gave a variable number of pixels in each sample section.To address this, all tissue sections were first standardised to fit a common x and y coordinate frame. The length of the equatorial poles on the lens sections were normalised to a reference lens section shape while preserving the original aspect ratios. The lens tissue section data from each time point were then loaded into a shared coordinate frame. Then, a lens-to-lens hexagonal binning (hexbin) approach was employed. Hexbins were chosen based on their ability to tessellate and form a grid across any polygon26. In this approach, hexbins were uniformly applied to the coordinate frame for spatial alignment with a median pixel count of 9 in each hexagon. These hexbins followed a consistent indexing approach across all samples. This systematic procedure guaranteed that hexbins occupying the same raster position always share the same index and closely matched the same anatomical position throughout the entire time series. Additionally, the indexes probed spatial coordinates, indicating the location on the lens sections. Once aligned, an average spectrum was computed for each hexbin, resulting in a dataset combining the spatial indexes of the hexbin with average mass spectra representing biological replicates and incubation timepoints. By averaging raw mass spectra into each hexbin spectrum, more stable spectral signals were produced for each hexbin, which can mitigate the effects of noise or variability in the MALDI data. Each hexbin was also assigned to an anatomical lens region (anterior, posterior, equator, inner cortex and core) based on conventions in the lens field (see Supplementary Fig. 1). These manually assigned anatomical labels have previously been used to quantify SIL metabolite signal levels over time19, and allowed temporal signal intensity patterns to be plotted for each region (see Supplementary Fig. 2). In addition, this approach allowed preliminary assessment of the performance of our automated annotation pipeline (see Module 2, Fig. 4a,b).At this point in the workflow, output data was structured such that each row represents a detected m/z feature, and each column represents a spatial position (i.e. hexbin index) from a single MS scan. However, the inclusion of spatial and temporal data in the columns of a general dataset poses a challenge in distinguishing whether changes in signals arise from spatial positioning or evolve over time during the incubation process. Consequently, to identify the metabolites responsible for the variations in spatial metabolic patterns, the dataset underwent a transformation into a reshaped data table where each column contains the full time profile data for a single hexbin.Module 2: Dimensionality reductionWhile the reshaped data table accommodated changes to m/z features due to incubation time and spatial position, it led to a substantial number of m/z features within each hexbin index, because the m/z features in this dataset are associated with time points. This implies a tenfold increase in the number of features (due to the 10 incubation timepoints), resulting in a high-dimensional dataset that necessitates dimensionality reduction for improved data interpretability.t-SNE (t-Distributed Stochastic Neighbour Embedding) and UMAP (Uniform Manifold Approximation and Projection) have been used in metabolomics analysis to map the high-dimensional data onto a 2D space while preserving the local structure and relationships between the data points27,28. In our study, the time profile data underwent separate executions of these two algorithms. The outputs were then projected to a standardised lens shape to visualise the spatial variation profiles within the reduced data space (Fig. 3a,b). A symmetric pattern was observed in t-SNE_1 and UMAP_1, while spatial differences between anterior and posterior are evident in the other dimensions (t-SNE_2, UMAP_2). PCA (Principal Component Analysis) was then applied to the reshaped data table and the first 5 principal components (PCs) were used to project to the standardised lens shape to visualise the spatial variation profile (Fig. 3c). The lens projections indicate that PC1 (11.51%), PC2 (3.57%), and PC3 (1.81%) exhibit a closely symmetric profile, whereas PC4 (1.41%) and PC5 (1.33%) clearly display an asymmetric pattern between the anterior and posterior lens.Figure 3Single dimension projection of t-SNE (a), UMAP (b) and PCA (c) to bovine lens based on the transformed coordinates of each hexbin in the lower-dimensional spaces. The PCA results are PC1 to PC5 from left to right. All lens diagrams are orientated with anterior surface to the right. The Blue-White-Red continuous color scale was employed to show the hexbin loading image from the lowest (blue) to the highest (red) in the dimension.To further investigate whether the reduced data space could profile the difference between the metabolic patterns of the lens anterior and posterior regions, biplots were used to show the combined effects of the dimensionality (Fig. 4, Supplementary Fig. 5). Since these data reduction methods (t-SNE, UMAP and PCA) are all unsupervised methods, the temporal hexbin data could undergo dimensionality reduction regardless of the lens anatomical labels that were previously assigned. To aid visualisation of the lens anatomical regions that were grouped by the algorithm in biplots, outer cortical lens regions were shrunk to 30% of their original sizes (Fig. 4a). These manual segmentation labels were added to the data to assist in interpretation of the clustering of the data. From the t-SNE and UMAP in Fig. 4b, the core (blue) and equator (purple) clusters can be well separated, while the anterior (red) and posterior (green) clusters do not have a very clear boundary but could be resolved in one dimension (t-SNE_2, UMAP_2). From the PCA biplot, the subplot of PC1 versus PC2 explained most of the variation within all the components, where the equatorial region (purple colour) and core region (blue colour) were well classified. While hexbins labelled ‘anterior’ (red colour) and ‘posterior’ (green colour) were highly overlapped in PC1 versus PC2, they were more tightly clustered in PC4 versus PC5 (Fig. 4). Together these results suggested that data reduction methods could be used to automatically highlight m/z signals that changed over time and were potentially derived from (U–13C6) glucose introduced during lens incubations.Figure 4Visualisation of the spatial variation of the lens MALDI data at reduced data space. (a) Schematic diagram indicating the lens anatomical labels to enhance the interpretability of the dimensionality reduced data. The lens data is classified mainly into 5 groups, including anterior (red), posterior (green), equator (purple), core (blue). Regions coloured with grey are not defined. (b) Dimensionality reduction for visualising the kinetic lens MALDI data using t-SNE, UMAP and PCA with predefined lens anatomical labels. (c) Unsupervised machine learning algorithm K means to automatically segment the dimensionality reduced lens data based on specification of k = 6 (colours show individual segments). (d) Mapping K-means clusters onto the standardized lens to visualize the spatial clustering pattern.Previously, manual labelling was used to group spectra into anatomical regions (see Supplementary Fig. 1b) for m/z feature time profile generation and interpretation. However this approach can introduce biases due to the arbitrary nature of the labelling. Therefore, segmentation of the dimensionality-reduced data was used to enhance the accuracy of the spatially-grouped spectra. The automatically generated spatial clusters can then be projected onto the lens to compare with the manually assigned anatomical regions (see Supplementary Fig. 1). For this procedure, the K-means algorithm was employed to spatially cluster the reduced data space (Fig. 4c). Importantly, this process utilised only the feature space and did not directly incorporate hexbin spatial information. The value of K in the algorithm was carefully considered, and K = 6 was an appropriate value that ensured that spatial variations could be discovered while minimising over-segmentation (see Supplementary Fig. 3). Projection of spatial K-means analysis onto each individual lens section (n = 30) following hexbin application showed very similar clustering (see Supplementary Fig. 4), suggesting that this approach was able to align lens anatomical regions across all timepoints.From the lens projections, the anterior and posterior lens regions were separated in t-SNE, UMAP and PCA dimensions PC4 versus PC5, but not PC1 versus PC2 (Fig. 4d). The primary variability in the data (represented by PC1 and PC2) showed a symmetric concentric ring pattern, with the lens outer cortex, inner cortex, and core regions evident. This matches known lens anatomical structures and patterns of endogenous small molecules, such as primary metabolites and phospholipids29. In contrast, PC4 versus PC5 fit the expected asymmetric pattern that was previously observed in lenses incubated in SIL glucose19. While SIL glucose signal derived from ex vivo incubation and the changes over time did not contribute a large proportion of the total variability in the dataset, the automated segmentation of the reduced data space aligns with both manual segmentation and the anticipated results derived from the anatomical structure of the lens and ex vivo incubation. Together, these results gave confidence that the spatial alignment approach was valid and that the observed asymmetry in the data-reduced space could be utlised in subsequent signal annotation.t-SNE and UMAP do not explicitly consider information on the relative contribution of each feature to the resulting dimensionality-reduced data. Therefore, while they may not be ideal for tasks such as feature selection and pathway enrichment, these methods consider full variation in the data to effectively capture the most prominent patterns or information present in the data (e.g. symmetric pattern from lens natural development and asymmetric pattern led by ex vivo SIL glucose incubation). This results in a more complete overview of variation reduction results where PCA shows a limited proportion of variations in each PC. Compared to PCA, t-SNE and UMAP are more straightforward to demonstrate the spatial variation, which provides insights for the interpretation of PCA results. The reliability of PCA results with similar patterns is enhanced with corroboration by t-SNE and UMAP support. Apart from the ability of dimensionality reduction, PCA also highlights the feature contribution in each PC. This characteristic facilitates the identification of key m/z features that underly the spatial segmentation of the lens, which could therefore be utilised in Module 3 for annotation and metabolic pathway analysis. In the case of the lens, there is an expectation of metabolic differences, leading to spatial clustering differences between the anterior and posterior regions. Therefore, the m/z features that significantly influenced PC4 and PC5 were likely the features responsible for the observed metabolic asymmetry between the anterior and posterior lens regions. This suggests that our automated pipeline is able to identify distinct metabolic activity regions in lens tissue. We leveraged the m/z features that contribute the most to the observed variation PC4 and PC5 in the following module to automatically annotate metabolites and determine enriched metabolic pathway activity in a spatial context.Module 3: Metabolite annotation, pathway enrichment, and visualizationIn an iso-imaging experiment, the biological tissue can simultaneously consume both SIL and naturally occurring metabolites, resulting in a mixture of labelled and unlabelled metabolites whose levels change over time and can therefore be key features of PCA dimensions. Therefore, to implement the automated metabolic activity analysis pipeline, the top 30 features from every dimension underwent the mummichog algorithm using a library that included both labelled and unlabelled metabolites that we previously generated (see “Metabolite library construction” in “Methods”). The mummichog algorithm is very sensitive while performing metabolite annotation and pathway enrichment, and led to many false positives following the first round of the annotation (Table 1). To remove false annotations in the 1st round annotation list, feature scoring and FDR-controlled filtering were employed to select only high-quality m/z features and assign them to potentially enriched pathways. Under this restriction, there is more confidence in the pathways that were identified as enriched since they are based on high-quality m/z feature annotations (Table 2). A comparison of annotations and pathways resulting from both rounds of annotation are shown in Supplementary Fig. 6. The number of enriched pathways was reduced from 14 to 5, which resulted in more biologically accurate information being produced. Finally, the automated pipeline allows for generation and display of the individual annotated ion images that are determined to change over time in a metabolic pathway context (see Fig. 5, for other pathway ion images see Supplementary Figs. 8–11).Table 1 1st round pathway enrichment result.Table 2 2nd round pathway enrichment result (All m/z features passed 10% FDR).Figure 5Time-series metabolite maps for annotated metabolites in starch and sucrose metabolism. All lenses are orientated with anterior to the right. PC = the principal component number(s) that the annotated metabolite shows significant loading in.One prominent pathway that was identified in the dataset was “Starch and sucrose metabolism”. The m/z images that were grouped in this pathway are shown in Fig. 5 showing it is most active in the lens outer cortex. While MALDI images of the m/z features from this metabolic pathway were relatively uniform, with most abundance observed in the lens cortex relative to the core in PC 1, additional MALDI images that represented SIL metabolites were significant in PC 4. These SIL MALDI images were found to change substantially over time and had spatial distributions that in several circumstances were asymmetrical (see SIL Glucose, SIL G6P and SIL F16BP). These MALDI images have previously been manually selected and their identities validated by GC–MS19, showing that our spatial metabolomics pipeline is able to return results similar to manually curated data.Another display available using the pipeline is via regional analysis. To achieve this, K means segmentations were manually assigned with a lens anatomical region (i.e. anterior/posterior/equator/inner cortex/core). The signal intensities of hexbins within each cluster were summarised, and intensities plotted against time (Fig. 6a). With the perfusion of SIL glucose, the anterior region exhibited a relatively quicker signal increase than the posterior region. A notable increase in SIL S7P intensity was observed at the initial stage of the incubation, with higher signal levels in the anterior compared to the posterior region. SIL glucose level in the anterior region declined from two hours to 8 h, while the posterior region showed a delayed response in comparison. SIL G6P in anterior region fluctuated after 1 h while the SIL F16BP constantly increased. In the posterior region, SIL G6P did not have a sharp drop but the rate of increase was not as quick as that observed in anterior.Figure 6Plotted time profiles of target SIL molecules (a) and kinetic MALDI images (b) to show appearance of SIL metabolites over time. SIL glucose ([M + Cl]− = m/z 221.0521), SIL sorbitol ([M + Cl]− = m/z 223.0679), SIL G6P ([M – H]− = m/z 265.0425), SIL S7P ([M – H]− = m/z 295.0538, threshold = 0.12), SIL F16BP ([M – H]− = m/z 345.0090), and UDP-glucose ([M – H]− = m/z 571.0667). The line plots are automatically generated using the pipeline. The metabolic pathways annotation here are mapped to KEGG pathway database.In addition to data visualisation in metabolic pathway and tissue regional formats, the dynamics of the metabolites of interest can also be depicted through kinetic images of specific labelled compounds (Fig. 6b). In this study, SIL glucose is primarily metabolised to F16BP and S7P through G6P. After a few hours, SIL sorbitol signal is also detected. These observations reflect the main metabolic pathways active in the lens30, which are represented in the pathway enrichment analysis (see Table 2). Additionally, this method enables the level of background noise to be determined for each metabolite by applying a signal threshold filter (Supplementary Fig. 7). Signal thresholding allows the display of real signal only in the kinetic images. Together, this pipeline supplies a comprehensive set of tools for automatic assignment of naturally-occurring and labelled metabolites, and a variety of display tools to better understand tissue function.

An isotopically-labelled temporal mass spectrometry imaging data analysis workflow to reveal glucose spatial metabolism patterns in bovine lens tissue

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis