Machine learning approaches to detect hepatocyte chromatin alterations from iron oxide nanoparticle exposure

Textural analysis and ML model training were performed on a total of 2000 regions of interest of hepatocyte nuclei created on digital micrographs of mouse liver tissue. The tissue was obtained from previously conducted experiments on healthy male, 16-weeks-old C57BL/6 (C57 black 6, B6) mice in which the experimental group (N = 10) was exposed to 2 mg/kg/day i.p. of Iron (II, III) oxide nanoparticles (80–100 nm, Hongwu International Group Ltd. HWNANO materials, Guangzhou, Guangdong, CN) for 3 days while the controls (N = 10) were received saline for the same time period. The research was granted approval by the University of Belgrade’s Faculty of Medicine Ethics Commission for the Protection and Welfare of Experimental Animals (approval number 229/2) as well as the Ministry of Agriculture, Forestry, and Water Management—Veterinary Division of the Republic of Serbia (approval number 323-07-07783/2020-05). The methodologies employed adhered to the standards set by the Universal Declaration of Animal Welfare (WSPA, London, 2000), the European Convention on the Protection of Vertebrates Used for Experimental and Other Scientific Purposes (1998), along with a range of national and international guidelines that advocate for the ethical and humane treatment of experimental animals.Liver tissue samples were preserved in Carnoy’s solution, a mixture of 60% ethanol, 30% chloroform, and 10% glacial acetic acid. Subsequently, these samples were embedded in Paraplast and tissue sections were created for glass slides. The sections were stained using These sections were first stained with hematoxylin to color the nuclei blue and then eosin to stain the cytoplasm and extracellular matrix pink. The staining protocol concluded with dehydration, clearing, and mounting the tissue on slides.Digital images of the liver tissue were captured using the Pro-MicroScan DEM 200 device (Oplenic Optronics, Hangzhou, CN) connected to an OPTIC900TH Trinocular Biological Microscope (COLO LabExperts, Novo Mesto, Slovenia). These micrographs were produced at a resolution of 1200 × 1600 pixels, a 24-bit depth, and a uniform horizontal and vertical resolution of 96 dpi (Fig. 1). The micrographs were saved in BMP format for run length matrix and discrete wavelet transform analysis. The OPTIC900TH microscope features a trinocular head with an interpupillary distance of 20:80, high eye-point plan eyepiece WF 10×/20, infinity plan achromatic objective 40× (S) N.A.0.70, Abbe condenser 0.9/1.25, Kohler illuminator with field diaphragm, and a 6V 20W tungsten-halogen lamp.Fig. 1Comparison of hepatocyte nuclei belonging to the group treated with IONPs (panels A–D) with intact nuclei (panels E–H). While they appear morphologically similar, significant differences are observed in the quantifiers of the run-length matrix and discrete wavelet transform.A total of 1000 ROIs originating from experimental group and 1000 ROIs from controls were selected and analyzed in Mazda software (version 4.6), which was developed during the COST B11 European project titled “Quantitative Analysis of Magnetic Resonance Image Texture” and the COST B21 European project “Physiological modelling of MR Image formation.” by researchers at the Institute of Electronics, Technical University of Lodz (TUL), Poland32,33,34,35. The selection of ROIs was done in a way to minimize potential bias. We did our best to ensure representative sampling of the entire tissue section by randomly choosing ROIs from different areas of the tissue. Each ROI was of uniform size to ensure consistency and the ROIs were selected to avoid regions with unclear boundaries or overlapping structures that could complicate the analysis. Observers involved in selecting ROIs were blinded to the experimental group assignments to prevent any unconscious bias in selecting ROIs from the experimental or control groups.Mazda is a comprehensive tool designed for the calculation of texture indicators in digitized images and it enables Region of Interest (ROI) selection, techniques for feature selection and reduction (e.g. Fisher coefficients, principal component analysis, linear discriminant analysis and nonlinear discriminant analysis), preprocessing options and tools for data classification, as well as 3D data analysis. In a digital micrograph, up to 16 regions of interest can be defined and customized. For texture evaluation, the software can calculate parameters derived from histograms, co-occurrence matrices, run-length matrices, autoregressive models, and wavelet transforms32,33,34,35.For each nuclear region of interest, we conducted run-Length matrix (RLM) analysis and discrete wavelet transform (DWT). In the RLM analysis, we assessed the variations in the intensities of resolution units within the ROIs which is crucial for understanding the texture of the two-dimensional signal. This is achieved by measuring “runs”—sequences of consecutive resolution units with the same gray-level intensities. During the RLM analysis, a specialized matrix that reflects texture anisotropy and other relational aspects within the ROI is created. In our study, particularly for machine learning applications, we focused on quantifying specific values such as Long Run Emphasis (LngREmph), Short Run Emphasis (ShrtREemph), and Run Length Nonuniformity (RLNonUni). We also focused on nuclear structures that despite being morphologically similar, have distinct values of textural features (Fig. 1). In MaZda software, the number of bits per pixel used for image quantization can be selected. Typically, 5 bits per pixel are used for RLM parameters. We used the default setting for normalization referring to the intensity range of the image under analysis, without applying “± 3 Sigma” option and “1–99%” option. The details on the mathematical and computational algorithm for quantification of textural parameters can be found in previous publications32,33,34,35.Long Run Emphasis quantifies the prominence of longer sequences (runs) of resolution unit gray intensities in an image, and it is significantly related with the texture’s coarseness. The formula for Long Run Emphasis is defined as follows:$${\text{LngREmph}} = \left( {\sum\limits_{i = 1}^{N_g} {\sum\limits_{j = }^{N_r} {{j^2}p(i,j)} } } \right)/C$$In this formula, p(i,j) represents the frequency of runs with length j that have a gray value of i. Here, Ng is the total number of gray values, and Nr is the count of specific run lengths analyzed in the study. The coefficient C in the equation is derived from these parameters as:$$C = \sum\limits_{i = 1}^{N_g} {\sum\limits_{j = 1}^{N_r} {p(i,j)} }$$Short Run Emphasis, in contrast, highlights the occurrence of shorter runs of resolution unit gray intensities, which is influenced by uniform and finer texture aspects of a two-dimensional signal. The calculation method for Short Run Emphasis is distinct and can be presented as:$${\text{ShrtREmph}} = \left( {\sum\limits_{i = 1}^{N_g} {\sum\limits_{j = 1}^{N_r} {\frac{p(i,j)}{{j^2}}} } } \right)/C$$Run Length Nonuniformity is related to distributional aspects of run lengths within the ROI. It is related to the textural heterogeneity in the signal. The quantification approach for Run Length Nonuniformity is as follows:$${\text{RLNonUni}} = \left( {\sum\limits_{j = 1}^{N_r} {{{\left( {\sum\limits_{i = 1}^{N_g} {p(i,j)} } \right)}^2}} } \right)/C$$When compared to the short run emphasis, run length nonuniformity is generally of greater magnitude since it sums the squares of run lengths. This results in larger numerical values, especially in images with varied textures and longer runs. The coefficient C is consistently used across the calculations of Long Run Emphasis, Short Run Emphasis, and Run Length Nonuniformity.During the discrete wavelet transform, after the creation of subband images at various decomposition levels, we calculated the value of a nuclear ROI wavelet coefficient energy. These energies are determined after multi-level breakdowns of two-dimensional signal data, processing them via wavelet filters. In this research, we calculated the energies (EnHL and EnLH) of a wavelet coefficient (d) using the combination of high (H) and low-pass (L) filters as follows:$$\text{En}=\frac{{\sum }_{{\varvec{x}},{\varvec{y}}\in {\varvec{R}}{\varvec{O}}{\varvec{I}}}{({{\varvec{d}}}_{{\varvec{x}},{\varvec{y}}}^{{\varvec{s}}{\varvec{u}}{\varvec{b}}{\varvec{b}}{\varvec{a}}{\varvec{n}}{\varvec{d}}})}^{2}}{{\varvec{n}}}$$where x and y are subband coordinates and n represents a number of resolution units. In the MaZda software, the transform was performed using a cascade of filters followed by factor-2 subsampling, and the wavelet features were calculated for multiple scales, with the maximum of 7 scales.The above described RLM features (short run emphasis, long run emphasis and run length nonuniformity) and two wavelet coefficient energies were used to train and test ML models. In this study we focused on creating two decision tree—based supervised machine learning models. The first model was built on the random forest classifier10, while the second employed the gradient boosting classifier algorithm11. The random forest as a powerful and versatile ensemble method, operates by assembling a multitude (“forest”) of decision trees from data subsets after which it applies majority voting for prediction. Sample bootstrapping and other features makes it relatively resilient to overfitting and in some cases can lead to significant classification accuracy9,10,36. Potential limitations of this model include issues related to data imbalance, complexity, resource utilization and interpretability10. The second model was based on the gradient boosting classifier, also a versatile and potentially powerful ensemble technique that in a multitude of decision trees systematically corrects errors from preceding trees11,12,37. Like random forest, the potential strength of this type of supervised ML model is reduced probability of overfitting which is in this case partially achieved by stochastic gradient boosting and tree constraints. The potential weaknesses are also similar to RF model and include the relative complexity and lack of interpretability. In our study, both models were developed using Python 3, scikit-learn library38 in the Google Colaboratory environment, the hosted Jupyter notebook service, and a CPU hardware accelerator in Google Cloud’s Hosted Runtime. The code for the models was previously partially developed as a part of the project SensoFracTW of the Science Fund of the Republic of Serbia based on the Saccharomyces cerevisiae cell model for evaluation of sublethal cell damage. The models were validated using a fivefold cross-validation approach by implementing through the “GridSearchCV” module from the “sklearn.model_selection” library. This required partitioning the dataset into five equal parts with four parts having been used for training the model, and the remaining part for testing. The training and testing were repeated five times and the he resulting performance metrics were averaged over the five iterations.A comprehensive grid search was conducted to optimize hyperparameters. In the case of random forest, these were number of trees (n_estimators), the number of features considered for splitting (max_features), the maximum depth of the trees (max_depth), the minimum number of samples required to split a node (min_samples_split), the minimum number of samples required to be at a leaf node (min_samples_leaf), and the use of bootstrap samples (bootstrap). For the gradient boosting classifier, we optimized hyperparameters including the number of boosting stages (n_estimators), the learning rate, the maximum depth of the trees (max_depth), the minimum number of samples required to split a node (min_samples_split), the minimum number of samples required to be at a leaf node (min_samples_leaf), the fraction of samples used for fitting the individual base learners (subsample), and the number of features considered for splitting (max_features). For the random forest model, the best hyperparameters were n_estimators = 100; max_features = ’Auto’, max_depth = 20, min_samples_split = 2, min_samples_leaf = 2, and bootstrap = True. For the gradient boosting classifier, the best-performing hyperparameters were: n_estimators = 100, learning_rate = 0.1, max_depth = 5, min_samples_split = 10, min_samples_leaf = 1, subsample = 0.8, and max_features = ’None’.For both models we determined the values of Mattheus correlation coefficient and F1 score using the scikit-learn library ‘metrics’ module. The Matthews correlation coefficient (MCC) in this context can be interpreted as a measure of the quality of binary classifications. Since it is essentially a coefficient of correlation between observed and predicted binary classifications its values can range between − 1 and + 1, the + 1 indicating the perfect prediction, 0 indicating that the model is no better than random prediction, and − 1 suggesting total disagreement between prediction and observation. The advantages of MCC as a performance indicator are its balance in measurement which is useful when the data distribution in classes is imbalanced. It also incorporates symmetry in performance evaluation since it treats false positives and false negatives equally. The F1 score can be regarded as a measure of a test’s accuracy in binary classification problems and it is the harmonic mean of precision and recall. Tn the context of our model performance, the precision represented the number of true positive results divided by the number of all positive results, while the recall represented the number of true positive results divided by the number of positives that should have been identified. As an addition to MCC and F1 calculation, we also determined the classification accuracy of both models,alsoalso using the scikit-learn library ‘metrics’ module. The same module was also used for the Receiver Operating Characteristic (ROC) analysis and the calculation of the area under the ROC curve.

Hot Topics

Related Articles