ChemReco: automated recognition of hand-drawn carbon–hydrogen–oxygen structures using deep learning

Synthetic image experimentIn this study, through ablation experiments, we experimented with various synthetic image methods mentioned in the “Synthetic chemical molecule structural data set” section to determine the optimal synthetic image generation method. Figures 5, 6, 7 and 8 illustrate several image synthesis methods for RDKit, RDKit-aug, RDKit-aug-bkg, RDKit-aug-deg, and RDKit-aug-deg-bkg, respectively. During the experiments, the change curve of loss value, Levenshtein Distance, Tanimoto coefficient, and Exact Match exact matching value on the verification set are observed.Figure 5Comparison of Loss changes.Figure 6Comparison of Levenshtein distance changes.Figure 7Comparison of Tanimoto coefficient changes.Figure 8Comparison of exact match changes.The experimental results above demonstrate that the model in this article can achieve commendable performance in the recognition of synthetic images among the various synthetic image methods. To provide a more precise validation of how the four steps of the synthetic image method in this article contribute to the identification of hand-drawn chemical molecular structures, Fig. 9 displays the experimental results of hand-drawn chemical molecular structures from the test set.Figure 9Accurate matching rate of different synthetic image methods on background and non-background data sets.The utilization of image augments and image degradation operations results in improved recognition accuracy for hand-drawn chemical molecular structures, both in images with backgrounds and those without. This suggests that the model becomes more adept at discerning relevant information, even amid potential “interference” like background elements, focusing its attention on the chemical molecular structure.Furthermore, the accuracy of images with backgrounds significantly increases after employing image augment and adding background compared to RDKit-aug. Providing molecular images with backgrounds for model learning allows it to acquire additional background-related knowledge. However, it is noteworthy that the accuracy of background-free images has decreased compared to RDKit-aug, indicating potential interference from the background images.Among datasets using image augment, image degradation, and background addition, the background datasets achieve the best results. The accuracy on the background-free datasets is lower than that of RDKit-aug-deg.In summary, it is observed that for the recognition of background-free datasets, RDKit-aug-deg attains the highest accuracy at approximately 46%. For the recognition of background datasets, RDKit-aug-bkg-deg achieves the highest accuracy at approximately 53%. Although the accuracy levels are not exceptionally high, it is noteworthy that during the model training process, the model did not acquire any knowledge about the structure of real hand-drawn chemical molecules. Solely relying on the knowledge from synthetic images yields an accuracy close to half.This outcome underscores the inherent similarity between synthetic images and real hand-drawn chemical molecular structural images. Consequently, in subsequent experiments, this paper randomly employs the RDKit-aug-deg synthesis method and the RDKit-aug-bkg-deg synthesis method for RDKit images to potentially enhance the accuracy for both images with and without backgrounds.Synthetic image number experimentTo assess the model’s performance across varying dataset sizes, this article conducted comparative experiments using different numbers of synthetic image datasets. These experiments aim to understand the model’s sensitivity to data quantity, assess its generalization capabilities under different dataset sizes, and evaluate its ability to handle larger amounts of data.As illustrated in Fig. 10, the accurate matching rates are presented for 100,000, 200,000, 500,000, and 1 million synthetic images on the test set, both with and without background images. Table 1 displays the average exact matching rates on the test set.Figure 10Accurate matching rate of different numbers of synthetic images on background and non-background data sets.Table 1 Average exact matching rate on the test set for different numbers of synthetic images.It is evident that as the amount of data increases, the accurate matching rate shows improvement, both for images with and without background. When using 1 million synthetic images as the training and verification sets, the highest accuracy is achieved for both the datasets without background and the datasets with background.Additionally, if the verification set is substituted with real hand-drawn chemical molecular structure images for testing, the obtained results are presented in Fig. 11 and Table 2.Figure 11The verification set is real hand-drawn chemical molecular structure images and the accurate matching rate of different numbers of synthetic images on the background and non-background data sets.Table 2 The verification set is real hand-drawn chemical molecular structure images and the average accurate matching rate of different numbers of synthetic images on the test set.The experimental result indicates that substituting the verification set with real hand-drawn chemical molecular structure images does not yield significant differences from the previous results with synthetic images. This suggests that, at present, when the model is solely learning from synthetic images, merely replacing the verification set with real hand-drawn chemical molecular structures is insufficient.To address this limitation, introducing some real hand-drawn chemical molecular structure images to the training set and allowing them to directly participate in the training process can enhance the model’s capacity to learn from real-world examples. This approach has the potential to improve the accuracy of model recognition to a certain extent.Synthetic images hand-drawn image mixing ratio experimentHence, this article conducted comparative experiments with various proportions and identified the optimal balance between synthetic and real hand-drawn images. As depicted in Fig. 12, the synthetic chemical molecular structural images and real hand-drawn chemical molecular structures are used in training sets with ratios of 100:0, 90:10, 50:50, 10:90, and 0:100, respectively. The accurate matching rates of the resulting models on the background test set and the background-free test set are presented. Table 3 summarizes the average exact matching rates for the different ratios on the test set.Figure 12Different synthetic images: accurate matching rate of real hand-drawn image proportions on background and no-background datasets.Table 3 Different synthetic images: average accurate matching rate of real hand-drawn image proportions on the test set.The results indicate that when using all real hand-drawn chemical molecular structures in the training set (i.e., synthetic images: real hand-drawn chemical molecular structures at a ratio of 0:100), the accuracy rate is very low. This is because the datasets consist entirely of real chemical molecular structure images, and the model overfits these data, resulting in low accuracy on the test set.Conversely, when the ratio is 100:0, indicating that the model’s training set is entirely composed of synthetic images, and the model has not learned any knowledge from real hand-drawn chemical molecular structures, the accuracy is also not high.At the ratio of 90:10, where 90% of the images are synthetic and 10% are real hand-drawn, the accuracy rate reaches 93.81%. Hence, the chosen composition is a 90:10 mix of synthetic images and real hand-drawn chemical molecular structure images.Encoder-decoder comparison experimentIn this experiment, the training set was curated from 1 million images, maintaining a ratio of 90:10 between synthetic chemical molecular structural images and real hand-drawn chemical molecular structural images, as it demonstrated the best performance in previous experiments. As illustrated in Fig. 13, it represents the exact matching rate on the test set with and without background when employing different encoder-decoder combinations. Table 4 provides the average exact matching rate on the test set for different encoder-decoder combinations.Figure 13Accurate matching rate of different encoder-decoder combinations on background and non-background data sets.Table 4 Average exact matching rate on the test set for different encoder-decoder combinations.The results indicate that the model combination utilizing EfficientNet + Transformer achieves the most favorable recognition effect, boasting a final average accuracy of 96.90%.Analysis of hand-drawn chemical molecule structural recognition results.In the aforementioned experiment, it is evident that when opting for a ratio of 90:10 between synthetic images and real hand-drawn images, utilizing a total of 1 million images as the training set, and employing EfficientNet + Transformer as the encoder-decoder combination, this approach outperforms other combination methods. It achieves the best results in identifying hand-drawn chemical molecular structures, yielding an accurate matching rate of 96.90% on the test set.Comparison with related studiesThere are currently limited studies on hand-drawn chemical molecular structure image recognition based on deep learning. In a related study24, a model combination of CNN + LSTM was utilized to convert hand-drawn hydrocarbon structure images into SMILES encoding. To enhance model accuracy, the author implemented a voting mechanism, achieving an accurate matching rate of 76% on the provided test set.In comparison, the top-performing hand-drawn chemical molecule structural recognition model in this study was also evaluated on the test set provided in this article. It achieved a remarkable exact matching rate of 93%, significantly surpassing the accuracy reported in the mentioned paper. This outcome substantiates the advantages of the hand-drawn chemical molecule structural recognition model proposed in this article.

Hot Topics

Related Articles