Identification of semen traces at a crime scene through Raman spectroscopy and machine learning

1.1 Data descriptionSemen samples used in this study were purchased from Lee Biosolutions, Inc. Samples were collected from anonymous donors in accordance with the approved protocol and a signed informed consent form. A small 10 µL drop of semen fluid was placed on a substrate and dried completely under room conditions overnight.A Renishaw inVia confocal Raman spectrometer equipped with a research-grade Leica microscope, 20 long-range lens (numerical aperture of 0.35), and WiRE 2.0 software was used. A 785-nm laser light was utilized for excitation. The laser power on the dried samples was about 115 mW, and the spot size of the excitation beam in standard confocal mode was about 5 μm wide. Accumulation time was 10 s for each scan. The spectral resolution was about 3.5 cm−1, and peak accuracy was assured by calibration with a silicon standard. For the automatic mapping, the lower plate of a Nanonics AFM MultiView 1000 system was set up under the microscope, and measurements were taken using Quartz II and QuartzSpec software. The used Raman spectra data set is presented in Table 1. The processing of the received data was carried out with MATLAB software.Table 1 The used Raman spectra dataset.The Raman spectra of semen samples were recorded at several spatial points on every substrate sample. Raman mapping was performed on 0.5 mm x 0.5 mm area for at least 35 points on a stain and at least 30 points of a pure substrate. In total, 24 samples of semen stain on Al, 2 samples of stain semen on blue polyester, 1 sample of semen on glass semen were analyzed. Raman spectra with strong cosmic ray noise were excluded automatically. The thickness of all substrates was about 1 mm.An outlier is a spectrum in a dataset, which is outside of the statistically estimated over the whole dataset interval of spectral curves. The outliers removal had been conducted using the random forests method30. The number of Raman spectra remaining after this procedure is shown in Table 1. Al foil was chosen because it shows minimal interference with Raman spectral measurements, and it as a good substrate to develop a spectral signature of a dry body fluid. Glass was chosen as a substrate with very strong fluorescence interference. Polyester was chosen because it has usually heterogeneous properties. A relatively large number of Raman spectra of a semen sample on Al foil used in the analysis compared with other samples allowed for obtaining the most accurate spectroscopic signature of pure semen (see Fig. 1).Fig. 1Typical spatial variations of Raman spectra of a semen stain on an Al foil substrate.The background mostly associated with fluorescence was subtracted by shape-preserving piecewise cubic interpolation of a Raman spectrum at neighboring grid points in a gliding spectral window with a width of 200 spectral points (182.2 cm−1), the quantile value is set to 10%31,32.The noise reduction was implemented using a Savitsky-Goley filter with the following parameters’ value: the order of the polynomial was equal to 1, and the gliding spectral window width was equal to 45 spectral points (41 cm−1). After that, the Raman spectra were averaged, and variations were evaluated. The Raman spectra of the polyester and glass substrates and the Raman spectra of seminal fluid on an aluminum substrate are shown in Fig. 2. Standard deviations are also shown (Fig. 3).Fig. 2The spatially averaged Raman spectra of polyester (a) and glass (b) substrates and seminal fluid on an aluminum substrate (c) before baseline correction. Here, plots (d)–(f) correspond to the same spectra after the baseline correction.Fig. 3The spatial averaged (with standard deviation) Raman spectra of polyester (a) and glass (b) substrates and seminal fluid on an aluminum substrate (c) after baseline correction.Further, RSC and MCRAD were applied to the experimental data, and the application of the methods is described in subsection 3.2. Theory/calculationBased on Raman spectroscopy, a special chemometric approach called MCRAD was developed to isolate and identify a biofluid stain on a strongly interfering substrate, which needs only knowledge of the reference Raman spectrum \(\:{S}_{ref}\:\)of the analyzed biofluid.24,27 Let us present an experimental Raman spectrum \(\:{S}_{org}\) of a biofluid stain on a substrate in the form:$$\:\begin{array}{c}{S}_{org}\left(k\right)={S}_{blank}\left(k\right)+C{\,\cdot\:S}_{ref}\left(k\right),\end{array}$$
(1)
where \(\:{S}_{blank}\) is the unknown Raman spectrum of the substrate, \(\:C\) is the biofluid stain volume fraction (VF), \(\:k\) is the Raman shift. According to the standard addition method, let us add \(\:(n-1)\) times an additional portion \(\:{C}_{add}\) of the Raman spectrum \(\:{S}_{ref}\) to the Raman spectrum \(\:{S}_{org}\):$$\:\begin{array}{c}{\widehat{S}}_{j}={S}_{org}+{\widehat{C}}_{j}{S}_{ref},\end{array}$$
(2)
where \(\:{\widehat{C}}_{j}\) =\(\:\:C+j\,\cdot\:{C}_{add}\), \(\:j={\overline{0,\:\dots\:,(n-1)}}\). Often, Raman spectra second derivative is used to improve sensitivity of spectral analysis due to eliminating the background impact and enhancing the spectra peculiarities. Then, the following matrix equation can be formulated:$$\:\begin{array}{c}\mathbf{A}=\left(\frac{{{d}^{2}\widehat{S}}_{1}}{d{k}^{2}},\dots\:,\frac{{{d}^{2}\widehat{S}}_{j}}{d{k}^{2}},\dots\:,\frac{{{d}^{2}\widehat{S}}_{n}}{d{k}^{2}}\right)\cong\:\mathbf{WH},\end{array}$$
(3)
where \(\:\mathbf{W}\) – matrix consisting of two parts: \(\:\frac{{d}^{2}{S}_{org}}{d{k}^{2}}\) and unknown \(\:\frac{{{d}^{2}\widehat{S}}_{ref}}{d{k}^{2}}\), below we use \(\:{W}_{blank}=\frac{{d}^{2}{S}_{org}}{d{k}^{2}}.\:\) The \(\:\mathbf{H}\) matrix includes the \(\:n-\)dimensional unit vector and vector of the biofluid stain VFs \(\:{H}_{j}\), which are linearly dependent on \(\:{\widehat{C}}_{j}\). The \(\:{W}_{blank}\) and \(\:{H}_{j}\) can be evaluated through an iteration procedure, which starts with an initial value of \(\:\mathbf{W}\:\)as follows:

1.

The search for the \(\:{H}_{j}\:\:\:\)value is carried out with known \(\:\mathbf{W}\) through minimizing a \(\:{L}_{2}\) norm: $$\:{\parallel\mathbf{A}-\mathbf{W}\mathbf{H}\parallel}_{2}.$$

2

The search for the \(\:{W}_{blank}\) is carried out with \(\:\mathbf{H}\) calculated at previous step through minimizing a \(\:{L}_{1}\) norm:$$\:\begin{array}{c}{\parallel\mathbf{A}-\mathbf{W}\mathbf{H}\parallel}_{2}-{c}_{{L}_{1}}\left(\left|{W}_{blank}\right|+\left|\frac{{{d}^{2}\widehat{S}}_{ref}}{d{k}^{2}}\right|\right),\end{array}$$
(4)

where \(\:{c}_{{L}_{1}}\) – a small parameter.

3.

The first and second steps are repeated until the iterations converge with a definite accuracy. In the result, \(\:{W}_{blank}\) becomes equal to zero, \(\:{H}_{1}=C.\) Knowing \(\:C\), we can find the Raman spectrum \(\:{S}_{blank}\).

Another approach to extracting a definite component concentration from a spectrum of a complex sample was developed by us29. This approach explores an idea of reducing spectrum complexity (RSC) when we remove entirely the target component spectrum \(\:{S}_{ref}\:\)multiplied on its VF \(\:C\) from the complex sample spectrum \(\:{S}_{org}\) 25. This idea can be implemented for Eq. (1) through the minimization of the following functional:$$\:\begin{array}{c}\delta\:f\left(\stackrel{\sim}{C}\right)=\int\:\left|\frac{{d(S}_{org}-\stackrel{\sim}{C}{S}_{ref})}{dk}\right|dk,\end{array}$$
(5)
where \(\:\stackrel{\sim}{C}\) is the evaluation of \(\:C.\)Both methods allow for solving the problem of interfering substrate, but they have different ways of solution. Their direct quantitative comparison is very useful for practical applications.

Hot Topics

Related Articles