Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization

Gulshan, V. et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402–2410 (2016).Article 
PubMed 

Google Scholar 
Ting, D. S. W. et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA 318, 2211–2223 (2017).Article 
PubMed 
PubMed Central 

Google Scholar 
Hacisoftaoglu, R. E., Karakaya, M. & Sallam, A. B. Deep Learning Frameworks for Diabetic Retinopathy Detection with Smartphone-based Retinal Imaging Systems. Pattern Recognit. Lett. 135, 409–417 (2020).Article 
ADS 
PubMed 
PubMed Central 

Google Scholar 
Ruamviboonsuk, P. et al. Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: a prospective interventional cohort study. Lancet Digit Health 4, e235–e244 (2022).Article 
CAS 
PubMed 

Google Scholar 
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Fujisawa, Y. et al. Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. Br. J. Dermatol 180, 373–381 (2019).Article 
CAS 
PubMed 

Google Scholar 
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).Article 
PubMed 

Google Scholar 
Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).Article 
PubMed 
PubMed Central 

Google Scholar 
Rodriguez-Ruiz, A. et al. Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: Comparison With 101 Radiologists. J. Natl Cancer Inst. 111, 916–922 (2019).Article 
PubMed 
PubMed Central 

Google Scholar 
Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).Article 
CAS 
PubMed 

Google Scholar 
Gurovich, Y. et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat. Med. 25, 60–64 (2019).Article 
CAS 
PubMed 

Google Scholar 
Wang, S. et al. A deep learning algorithm using CT images to screen for Corona virus disease (COVID-19). Eur. Radio. 31, 6096–6104 (2021).Article 
CAS 

Google Scholar 
Lundberg, Scott M., & Su-In Lee. “A unified approach to interpreting model predictions.” Advances in neural information processing systems (2017).Belthangady, C. & Royer, L. A. Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction. Nat. Methods 16, 1215–1225 (2019).Article 
CAS 
PubMed 

Google Scholar 
Andrews, B. et al. Imaging cell biology. Nat. Cell Biol. 24, 1180–1185 (2022).Article 
CAS 
PubMed 

Google Scholar 
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).Article 
CAS 
PubMed 

Google Scholar 
DeGrave, A. J., Cai, Z. R. & Janizek, J. D. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01160-9 (2023).Article 
PubMed 

Google Scholar 
Rotem, O., Zaritsky, A. (2024a). Visual interpretability of bioimaging deep learning models. Nature Methods 2024. https://doi.org/10.1038/s41592-024-02322-6. (in press)Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929) (2016).Selvaraju, R. R., et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626. (2017).Shrikumar, A., Greenside, P. and Kundaje, A. Learning important features through propagating activation differences. In International conference on machine learning (pp. 3145-3153). (PMLR, 2017).Lang, O., et al, Explaining in Style: Training a GAN to explain a classifier in StyleSpace. In 2021 IEEE. In CVF International Conference on Computer Vision (ICCV). pp. 673-682. (2021).Zaritsky, A. et al. Interpretable deep learning uncovers cellular properties in label-free live cell images that are predictive of highly metastatic melanoma. Cell Syst. 12, 733–747.e6 (2021).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Rodríguez, P., et al. Beyond trivial counterfactual explanations with diverse valuable explanations. In Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1056–1065. (2021).Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 1, 206–215 (2019).Article 
PubMed 
PubMed Central 

Google Scholar 
Gardner, D. K., Lane, M., Stevens, J., Schlenker, T. & Schoolcraft, W. B. Blastocyst score affects implantation and pregnancy outcome: towards a single blastocyst transfer. Fertil. Steril. 73, 1155–1158 (2000).Article 
CAS 
PubMed 

Google Scholar 
Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology. The Istanbul consensus workshop on embryo assessment: proceedings of an expert meeting. Hum Reprod. 26, 1270–1283 (2011).Raef, B. & Ferdousi, R. A Review of Machine Learning Approaches in Assisted Reproductive Technologies. Acta Inf. Med. 27, 205–211 (2019).Article 

Google Scholar 
Simopoulou, M. et al. Are computational applications the “crystal ball” in the IVF laboratory? The evolution from mathematics to artificial intelligence. J. Assist Reprod. Genet 35, 1545–1557 (2018).Article 
PubMed 
PubMed Central 

Google Scholar 
Bormann, C. L. et al. Consistency and objectivity of automated embryo assessments using deep neural networks. Fertil. Steril. 113, 781–787.e1 (2020).Article 
PubMed 
PubMed Central 

Google Scholar 
Khosravi, P. et al. Deep learning enables robust assessment and selection of human blastocysts after in-vitro fertilization. NPJ Digit Med. 2, 21 (2019).Article 
PubMed 
PubMed Central 

Google Scholar 
Chavez-Badiola, A. et al. Predicting pregnancy test results after embryo transfer by image feature extraction and analysis using machine learning. Sci. Rep. 10, 4394 (2020a).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Tran, D., Cooke, S., Illingworth, P. J. & Gardner, D. K. Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer. Hum. Reprod. 34, 1011–1018 (2019).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Chen, Tsung-Jui, et al. “Using Deep Learning with Large Dataset of Microscope Images to Develop an Automated Embryo Grading System.” Fertil. Reprod. n. pag. (2019).Uyar, A., Bener, A. & Ciray, H. N. Predictive Modeling of Implantation Outcome in an In-Vitro Fertilization Setting: An Application of Machine Learning Methods. Med. Decis. Mak. 35, 714–725 (2015).Article 

Google Scholar 
Silver, D. H., et al. Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-lapse Imaging. ArXiv, abs/2006.01035. (2020).Fitz, V. W. et al. Should there be an “AI” in TEAM? Embryologists selection of high implantation potential embryos improves with the aid of an artificial intelligence algorithm. J. Assist Reprod. Genet 38, 2663–2670 (2021).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Gardner, D. K. and Schoolcraft, W. B. In-Vitro Culture of Human Blastocyst. In Towards Reproductive Certainty: Infertility and Genetics Beyond (eds. Jansen, R. & Mortimer, D.) 377-388. (Parthenon Press, Carnforth, 1999).Simonyan, K., & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556. (2014).Thirumalaraju, P. et al. Evaluation of deep convolutional neural networks in classifying human embryo images based on their morphological quality. Heliyon 7, e06298 (2021).Article 
PubMed 
PubMed Central 

Google Scholar 
Pihlgren, G. G., Sandin, F. and Liwicki, M., 2020, July. Improving image autoencoder embeddings with perceptual loss. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.Bardes, A., Ponce, J. & LeCun, Y. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. International Conference on Learning Representations (ICLR), pp. 1–12 (2021).Renieblas, G. P., Nogués, A. T., González, A. M., Gómez-Leon, N. & Del Castillo, E. G. Structural similarity index family for image quality assessment in radiological images. J. Med. Imaging (Bellingham 4, 035501 (2017).Sciorio, R., Thong, D., Thong, K. J. & Pickering, S. J. Clinical pregnancy is significantly associated with the blastocyst width and area: a time-lapse study. J. Assist Reprod. Genet 38, 847–855 (2021).Article 
PubMed 
PubMed Central 

Google Scholar 
Lagalla, C. et al. A quantitative approach to blastocyst quality evaluation: morphometric analysis and related IVF outcomes. J. Assist Reprod. Genet 32, 705–712 (2015).Article 
PubMed 
PubMed Central 

Google Scholar 
Ribeiro, M. T., Singh, S. and Guestrin, C. “ Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 1135–1144. (2016).Shahbazi, M. N. Mechanisms of human embryo development: from cell fate to tissue shape and back. Development 147, dev190629 (2020).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Du, Q. Y. et al. Blastocoele expansion degree predicts live birth after single blastocyst transfer for fresh and vitrified/warmed single blastocyst transfer cycles. Fertil. Steril. 105, 910–919.e1 (2016).Article 
PubMed 

Google Scholar 
Gaugler, Joseph., Bryan, James, Tricia, Johnson, Ken, Scholz, en Jennifer, Weuve. 2016. “2016 Alzheimer’s disease facts and figures”. Alzheimer’s and Dementia. https://doi.org/10.1016/j.jalz.2016.03.0001.Dubey, S. “Alzheimer’s Dataset (4 class of Images)”. 2019.https://www.kaggle.com/datasets/tourist55/alzheimers-dataset-4-class-of-images.Frisoni, G. B., Fox, N. C., Jack, C. R., Scheltens, P. & Thompson, P. M. The clinical use of structural MRI in Alzheimer disease. Nat. Rev. Neurol. 6, 67–77 (2010).Article 
PubMed 
PubMed Central 

Google Scholar 
Rotem, O., Zaritsky, A. DISentangled Counterfactual Visual interpretER (DISCOVER) generalizes to natural images. ArXiv, 2406.15918. (2024b).Nagaya, M. & Ukita, N. Embryo Grading With Unreliable Labels Due to Chromosome Abnormalities by Regularized PU Learning With Ranking. IEEE Trans. Med. Imaging 41, 320–331 (2022).Article 
PubMed 

Google Scholar 
Diakiw, S. M. et al. An artificial intelligence model correlated with morphological and genetic features of blastocyst quality improves ranking of viable embryos. Reprod. Biomed. Online 45, 1105–1117 (2022).Article 
PubMed 

Google Scholar 
Wang, S., Zhou, C., Zhang, D., Chen, L. and Sun, H. A deep learning framework design for automatic blastocyst evaluation with multifocal images. IEEE Access, 9, 18927–18934 (2021b).Sawada, Y. et al. Evaluation of artificial intelligence using time-lapse images of IVF embryos to predict live birth. Reprod. Biomed. Online 43, 843–852 (2021).Article 
PubMed 

Google Scholar 
Chattopadhyay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. Grad-CAM + +: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 839–847. (2017)Ramaswamy, H. G. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 983–991. (2020).Ali, A., Shaharabany, T. & Wolf, L. Explainability Guided Multi-Site COVID-19 CT Classification. ArXiv, (2021). abs/2103.13677.Bach, S. et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS One 10, e0130140 (2015).Article 
PubMed 
PubMed Central 

Google Scholar 
Achtibat, R. et al. From attribution maps to human-understandable explanations through Concept Relevance Propagation. Nat. Mach. Intell. 5, 1006–1019 (2023).Article 

Google Scholar 
Gur, S., Ali, A. and Wolf, L. Visualization of supervised and self-supervised neural networks via attribution guided factorization. In Proceedings of the AAAI conference on artificial intelligence. Vol. 35, 11545–11554. (2021).Samangouei, P., Saeedi, A., Nakagawa, L. & Silberman, N. ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations. In Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(). (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) Vol. 11214, https://doi.org/10.1007/978-3-030-01249-6_41 (Springer, Cham. 2018).Eckstein, N., Bates, A. S., Jefferis, G. S. and Funke, J. Discriminative attribution from counterfactuals. arXiv preprint arXiv:2109.13412. (2021).Narayanaswamy, A., et al. Scientific Discovery by Generating Counterfactuals using Image Translation. ArXiv, abs/2007.05500. (2020)Nemirovsky, D., Thiebaut, N., Xu, Y., & Gupta, A. CounteRGAN: Generating Realistic Counterfactuals with Residual Generative Adversarial Nets. ArXiv, abs/2009.05199. (2020).Shih, S., Tien, P. & Karnin, Z. S. GANMEX: One-vs-One Attributions Guided by GAN-based Counterfactual Explanation Baselines. arXiv (2020).Liu, S., Kailkhura, B., Loveland, D. and Han, Y. Generative counterfactual introspection for explainable deep learning. In 2019 IEEE global conference on signal and information processing (GlobalSIP). 1-5. (IEEE, 2019).Joshi, S., Koyejo, O., Kim, B., & Ghosh, J. xGEMs: Generating Examplars to Explain Black Box Models. ArXiv, abs/1806.08867. (2018).He, Z., Zuo, W., Kan, M., Shan, S. & Chen, X. Attgan: Facial attribute editing by only changing what you want. IEEE Trans. image Process. 28, 5464–5478 (2019).Article 
ADS 
MathSciNet 
PubMed 

Google Scholar 
Gabbay, A. and Hoshen, Y. Scaling-up disentanglement for image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6783–6792. (2021).Li, X., Lin, C., Li, R., Wang, C. and Guerin, F.,November. Latent space factorisation and manipulation via matrix subspace projection. In International Conference on Machine Learning. 5916–5926. (PMLR, 2020).Higgins, I. et al. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat. Commun. 12, 6456 (2021).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Wu, Z., Lischinski, D. and Shechtman, E. Stylespace analysis: Disentangled controls for stylegan image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12863-12872. (2021).Härkönen, E., Hertzmann, A., Lehtinen, J. & Paris, S. Ganspace: Discovering interpretable gan controls. Adv. neural Inf. Process. Syst. 33, 9841–9850 (2020).
Google Scholar 
Oliva, A. & Isola, P. GANalyze: Toward visual definitions of cognitive image properties. J. Vis. 20, 297 (2020).Article 

Google Scholar 
Voynov, A. and Babenko, A. Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning. 9786-9796. (PMLR, 2020).Barnett, A. J. et al. A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nat. Mach. Intell. 3, 1061–1070 (2021).Article 

Google Scholar 
Kraus, O. Z. et al. Automated analysis of high-content microscopy data with deep learning. Mol. Syst. Biol. 13, 924 (2017).Article 
PubMed 
PubMed Central 

Google Scholar 
Graziani, M., Andrearczyk, V. & Müller, H. Visual interpretability for patch-based classification of breast cancer histopathology images. In Proc. Med. Imag. Deep Learn. pp. 1–4 (2018)Wu, J., et al. DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation. ArXiv, abs/1805.12323. (2018).Singh, A., Sengupta, S. & Lakshminarayanan, V. Explainable Deep Learning Models in Medical Image Analysis. J. Imaging 6, 52 (2020).Article 
PubMed 
PubMed Central 

Google Scholar 
Zhang, K. et al. Deep-learning models for the detection and incidence prediction of chronic kidney disease and type 2 diabetes from retinal fundus images. Nat. Biomed. Eng. 5, 533–545 (2021).Article 
CAS 
PubMed 

Google Scholar 
Singla, Sumedha, Eslami, Motahhare, Pollack, Brian, Wallace, Stephen, and Batmanghelich, Kayhan. “Explaining the black box smoothly—A counterfactual approach”. Medical Image Analysis 84. Country unknown/Code not available. https://doi.org/10.1016/j.media.2022.102721. https://par.nsf.gov/biblio/10388285.Thiagarajan, J. J., Thopalli, K., Rajan, D. & Turaga, P. Training calibration-based counterfactual explainers for deep learning models in medical image analysis. Sci. Rep. 12, 597 (2022).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Mertes, S., Huber, T., Weitz, K., Heimerl, A. & André, E. GANterfactual-Counterfactual Explanations for Medical Non-experts Using Generative Adversarial Learning. Front Artif. Intell. 5, 825565 (2022).Article 
PubMed 
PubMed Central 

Google Scholar 
Soelistyo, C. J. et al. Learning biophysical determinants of cell fate with deep neural networks. Nat. Mach. Intell. 4, 636–644 (2022).Article 

Google Scholar 
Lamiable, A. et al. Revealing invisible cell phenotypes with conditional generative modeling. Nat. Commun. 14, 6386 (2023).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Ahlström, A., Westin, C., Reismer, E., Wikland, M. & Hardarson, T. Trophectoderm morphology: an important parameter for predicting live birth after single blastocyst transfer. Hum. Reprod. 26, 3289–3296 (2011).Article 
PubMed 

Google Scholar 
Hill, M. J. et al. Trophectoderm grade predicts outcomes of single-blastocyst transfers. Fertil. Steril. 99, 1283–1289.e1 (2013).Article 
PubMed 

Google Scholar 
Thompson, S. M., Onwubalili, N., Brown, K., Jindal, S. K. & McGovern, P. G. Blastocyst expansion score and trophectoderm morphology strongly predict successful clinical pregnancy and live birth following elective single embryo blastocyst transfer (eSET): a national study. J. Assist Reprod. Genet 30, 1577–1581 (2013).Article 
PubMed 
PubMed Central 

Google Scholar 
Richter, K. S., Harris, D. C., Daneshmand, S. T. & Shapiro, B. S. Quantitative grading of a human blastocyst: optimal inner cell mass size and shape. Fertil. Steril. 76, 1157–1167 (2001).Article 
CAS 
PubMed 

Google Scholar 
Sivanantham, S., Saravanan, M., Sharma, N., Shrinivasan, J. & Raja, R. Morphology of inner cell mass: a better predictive biomarker of blastocyst viability. PeerJ 10, e13935 (2022).Article 
PubMed 
PubMed Central 

Google Scholar 
Gardner, D. K. et al. A prospective randomized trial of blastocyst culture and transfer in in vitro fertilization. Hum. Reprod. 13, 3434–3440 (1998).Article 
CAS 
PubMed 

Google Scholar 
He, K., Gkioxari, G., Dollár, P. and Girshick, R. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961-2969. (2017)Duda, R. O. & Hart, P. E. Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972).Ronneberger, O., Fischer, P. and Brox, T., U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). (Springer International Publishing, 2015).Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Hasan, M. E. & Wagler, A. New Convolutional Neural Network and Graph Convolutional Network-Based Architecture for AI Applications in Alzheimer’s Disease and Dementia-Stage Classification. AI 5, 342–363 (2024).Article 

Google Scholar 
Erlich, I. et al. Pseudo contrastive labeling for predicting IVF embryo developmental potential. Sci. Rep. 12, 2488 (2022).Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Chavez-Badiola, A., Flores-Saiffe-Farías, A., Mendizabal-Ruiz, G., Drakeley, A. J. & Cohen, J. Embryo Ranking Intelligent Classification Algorithm (ERICA): artificial intelligence clinical assistant predicting embryo ploidy and implantation. Reprod. Biomed. Online 41, 585–593 (2020).Article 
CAS 
PubMed 

Google Scholar 
VerMilyea, M. et al. Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF. Hum. Reprod. 35, 770–784 (2020).Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Makhzani, A., Shlens, J., Jaitly, N., & Goodfellow, I. J. Adversarial Autoencoders. ArXiv, abs/1511.05644. (2015).He, K., Zhang, X., Ren, S. and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778. (2016).Zhang, R., Isola, P., Efros, A. A., Shechtman, E. and Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586-595. (2018).

Hot Topics

Related Articles