Long-term trends of pH, alkalinity, and hydrogen ion concentration in an upwelling-dominated coastal ecosystem: Ría de Vigo, NW Spain

DataThe data used in this study consists of two differentiated blocks: one to train the neural networks and other to make predictions. The database chosen for the training purpose is called ARIOS (Acidification in the Rías and the Iberian Continental Shelf)20. It was obtained from several oceanographic cruises conducted over four decades, from June 1976 to September 2018, carried out by the Instituto de Investigaciones Marinas (IIM), dependent of the Consejo Superior de Investigaciones Científicas (CSIC). ARIOS database is a compilation of biogeochemical properties, with discrete measurements of temperature, salinity, oxygen, nutrients, alkalinity, pH and chlorophyll. It was selected based on its reliability for long-term analysis. Oceanographic cruises were conducted along the Galician coast, with a special focus on the Ría de Vigo. This study selected data for training from the area between latitudes $\hbox {42}^{\circ }$ N and $\hbox {42.35}^{\circ }$ N, and the longitudes between $\hbox {8.6}^{\circ }$ W and $\hbox {9.11}^{\circ }$ W, in the upper 50 m of the water column (5755 data points for pH and 3850 for alkalinity). The temporal and data coverage is irregular, preventing the generation of continuous time series data. However, it provides sufficient data for neural networks to establish relationships between drivers, pH and alkalinity. A full description of ARIOS database can be found in20, where the sampling, analytical and quality control techniques are extensively described.Figure 1Map showing the six stations across Ría de Vigo (from V1 to V6). The color bar represents depth in meters.The data used to feed the trained neural network, in order to obtain the outputs, comes from measurements by the Instituto Tecnolóxico para o Control do Medio Mariño (INTECMAR). Temperature, salinity, phosphate, nitrate, silicate and dissolved oxygen were measured on a weekly basis, in three depth ranges: 0–5 m, 5–10 m and 10–15 m. These variables were obtained in six stations across Ría de Vigo (Fig. 1) from 1995 to 2020. Further details of this database, including sampling techniques, can be consulted in21 and22.Neural networksAlkalinity and pH are the target variables. The neural network architecture and the method used to obtain results, along with their errors, were similar for both . Thus, one ensemble of trained neural networks predicted alkalinity, and another one, with its particular weights, predicted the pH. During training, the weights were adjusted in order to obtain results as similar as possible to the target values. If this process continues without constraints, the model will adapt excessively to the peculiarities of the data, a phenomenon known as overfitting, thereby losing its generalization capability. To avoid this, the training data were randomly split into a training set $(90 \%)$ and a test set (the remaining $10 \%)$. The performance could be deduced from the test set, as it is independent and did not affect the training process.The robustness of a neural network can be improved by combining the results of individual networks, in what is called a committee or ensemble model17,18,23,24. The first step involved creating each of the ten members that constitute the ensemble, followed by averaging their results to obtain the final output. In this case, the individual neural networks have the same architecture, since the process is stochastic, the results are expected to be slightly different, so averaging them will mitigate the error.Each network is a multilayer perceptron of two hidden layers, with 28 neurons in the first layer and 10 in the second one for pH, and one hidden layer with 40 neurons for alkalinity. This combination was obtained after several trials were conducted in order to minimize the error. Bayesian regularization was used. Bayesian regularized neural networks are robust, they are difficult to overtrain or overfit, stopping training when necessary and effectively turning off weights that are not relevant25. The Matlab Neural Network Toolbox and the algorithm “trainbr” were chosen for this implementation.To evaluate the performance of the model, the retrieved results were compared to the corresponding observations. Several statistical indices were used: mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE) and the coefficient of determination ($r^2$). These statistics were performed on the test set in order to evaluate the ability of the model to generalize.The chosen input variables for the networks were: latitude, longitude, depth, temperature, salinity, phosphate, nitrate, silicate, year and week. This decision was made based on the influence these variables have on the targets26. The periodicity of the input week was represented by its sine and cosine. Although in most cases the dissolved oxygen concentration mirrors the seasonal cycle of pH, oxygen was not chosen as an input given its low reliability in the INTECMAR database. Despite strong biological activity being the main driver of pH changes20, the other variables chosen were expected to account for it. This assumption is supported by precedents relaying in a different combination of inputs for predicting carbonate chemistry parameters, as17, requiring at least salinity and coordinate information.Long-term trendsThe pH results were transformed into concentration of total hydrogen ions, in nanomoles per kg of seawater, that is determined to be:$$\begin{aligned}{}[H^+] =10^{9-pH} \end{aligned}$$
(1)
Thus, pH obtained through the ensemble was transformed using Eq. 1, and then tested against the [$\hbox {H}^+$] test set values (previously transformed from pH). Notice that changes in pH represent a relative change in [$\hbox {H}^+$] rather than an absolute change27. This transformation was motivated due to evidence, suggesting that expressing acidification trends in [$\hbox {H}^+$] avoids the non-linearity of the logarithmic scale and because seawater $\hbox {pCO}_2$ has a considerably more linear $(99.5\%)$ relationship to [$\hbox {H}^+$] than to pH28,29.Being salinity the main driver for alkalinity, its effect should be removed in order to analyse the underlying trend of alkalinity. The normalized total alkalinity (NTA) was calculated from TA using different methods to compare them and find the most appropriate one for this specific region. The simplest method is based in a reference salinity of 35, for which it was applied the following equation30:$$\begin{aligned} NTA= TA \cdot \frac{35}{S} \end{aligned}$$
(2)
Where S is the salinity measured by INTECMAR for each specific value of projected TA. However, this traditional normalization concept has been criticised, since it is usually not able to adjust surface TA for salinity variations31. This is why31 propose the use of empirical relationships, as the following equation:$$\begin{aligned} NTA= TA + \alpha \cdot ( 35 – S) \end{aligned}$$
(3)
Being $\alpha$ the slope of the linear regression of alkalinity data versus salinity. $\alpha$ was calculated for each station and depth, and for all data at the same time, as a global constant for the Ría de Vigo.The outliers were determined to be values with a standard deviation greater than 3 units, and therefore were removed. Long-term trends were obtained for alkalinity, pH, and [$\hbox {H}^+$]. A seasonal detrending to remove the seasonality was applied for each variable and station. The method followed was the one applied by32, by which an oscillatory function is fitted to the data:$$\begin{aligned} y(t)= A sin (\omega t + \phi ) + Bt + C \end{aligned}$$
(4)
Where $A sin (\omega t + \phi )$ is the seasonal component, and the parameter B corresponds to the trend of the data. Furthermore, after removing the seasonal component, a standard linear regression was performed to obtain the trend, i.e., B. The confidence intervals, $r^2$ and p-value were obtained from the linear regression.

Long-term trends of pH, alkalinity, and hydrogen ion concentration in an upwelling-dominated coastal ecosystem: Ría de Vigo, NW Spain

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Hot Topics

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Popular Articles

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models