Concordance in the estimation of tumor percentage in non-small cell lung cancer using digital pathology

To our knowledge, this is the first report of a digital pathology ring trial for tumor content assessment using QuPath, a free open-source software for digital image analysis10. We trained pathologists from different Spanish centers to obtain the percentage of TC in NSCLC HE WSI. We found that the agreement between pathologists was poor. This was mostly due to the subjectivity of some tasks, like the annotation phase and the classification of TC and tumor-associated stromal cells.In the first ring trial, two outliers were due to human errors (5.6% of all analyses, in WSI 2 and WSI4); while only one was found in the second ring trial (1.25% of all analyses, in WSI 9 (Fig. 4-L)). Thus, the training between both ring trials greatly reduced human errors. This may explain the better agreement results in the second ring trial results, from 0.09 ICC in the first ring trial to 0.24 ICC in the second one. In this regard, a previous study18 on the reproducibility of TC content by visual examination of colon and lung carcinoma reported an improvement in successive trials, thanks to a better definition of tumor cellularity.To face the subjectivity issue, in the future, we plan to apply artificial intelligence tools to perform the annotations. The distinction of tumor versus normal in lung cancer WSI using machine learning and deep learning has been previously reported with good results19,20. Thus, we will train models with annotations made by different pathologists in WSI from different scanners, as well as exploring commercial solutions.A distinctive aspect of our study is the evaluation of HE WSI by different pathologists using digital analysis against the visual assessment by one pathologist. Most published studies report good correlation between manual and digital assessment of different tasks, usually comparing visual assessment by various pathologists against digital analysis by one pathologist. In fact, in this study, we observed strong correlation (R = 0.7) between visual assessment and QuPath in the 41 INGENIO cases. For example, Naso et al.11 reported a concordance correlation coefficient of 92.5% in the assessment of PDL1 in NSCLC between the visual assessment by three pathologists and one pathologist using QuPath. As mentioned in the introduction, Kazdal et al.9 analyzed the agreement in the estimation of TC content of 120 NSCLC between 19 visual raters and two software, QuPath and Halo. Using HE, they obtained 0.87 ICC between both platforms. The ICC decreased to 0.48 when comparing the results obtained by the 19 visual raters. Moreover, when comparing the average TC content of the software against that of the visual raters, the ICC was 0.78. Ruiter et al.12 studied the number of CD57 positive cells in head and neck squamous cell carcinoma, comparing two visual raters against QuPath. While the concordance between the two visual raters was excellent (ICC 0.92), the concordance between each of them and QuPath was moderate to good (ICC 0.74 and 0.84). These slightly worse results were due to background staining and artifacts, resulting in outliers when analyzing with QuPath. Lastly, Cieslak et al.13 analyzed CD30 and CD3 expression in mycosis fungoides samples, comparing 3 visual raters against QuPath and found a strong correlation between both (R = 0.93). However, they do not provide ICC values and it seems that QuPath assigns higher CD30/CD3 scores than the raters. In any case, it would be interesting to explore the concordance between different QuPath users in these tasks.In this regard, Loughrey et al.21 studied the concordance of three reviewers (a pathologist, a biochemist and a computer scientist) using QuPath in the evaluation of p53 and CD3 in tissue microarrays. In their case, discordances were mostly due to applying different QuPath parameters and thresholds, probably because the cores of tissue microarrays do not require such delicate annotations as WSI.A dilemma we encountered while designing the digital analysis methodology was related to perfection versus accessibility. As INGENIO is a multicenter study, we opted for the latter. For instance, some pathologists are familiarized with more precise cell segmentation methods, like Stardist22. However, Stardist requires Groovy scripts for it to work, making its use more difficult than QuPath’s built-in cell detection tool. Thus, the built-in cell detection tool is more accessible to pathologists, as they are not usually trained in programming skills. In addition, Stardist also requires more computing power and running time, as it is based on a deep learning method, and the computers used by pathologists are usually not suited for high-performance. In this context, a recent article23 describes the use of a QuPath script (QuANTUM) to assess tumor cellular fraction in NSCLC HE WSI. They follow a similar process for digital analysis as the one used in this study, with the implementation of Stardist for cell detection. It might have the disadvantages of scripting and computer power described above, but the use of on-screen instructions make it more user friendly. Moreover, in this article23 they also study how TC percentage affects copy number variations (CNVs) calling. They found a 24% difference in detected CNVs when using the TC percentage from QuANTUM compared to visual assessment, underscoring the need for a more reliable TC percentage evaluation.Regarding molecular alterations, in our study, three cases with ≤ 20% TC by visual assessment presented NGS alterations (WSI 2 20% TC, WSI 9 10% TC and WSI 10 20% TC). When studying them with QuPath, the mean TC percentage of WSI 9 and WSI 10 were 27% (excluding the center with the human error) and 39%, respectively, while for WSI 2 it was 19.6%. These are all cases near the visual detection threshold recommended for molecular techniques2, and these low TC percentage cases were frequently assigned higher values by QuPath than by visual assessment. In these cases, setting the cutoff with a digital estimate greater than or equal to 20% would be more appropriate than with the visual assessment since they would be accepted as cases suitable for routine diagnosis of NGS. This is in contrast to the findings of L’Imperio et al.23. , who report higher TC percentage by visual assessment. Interestingly, when studying 41 cases included in the INGENIO project, we found that those with lowest TC (both by visual assessment and by QuPath) did present molecular alterations. These cases with a percentage higher than 20% by digital analysis with no alterations by NGS, can be considered as “flat profile” or without mutations cases. However, it is recognized that low TC might yield false negative NGS results. For example, Patel et al.24 reported a primary colorectal cancer with driver mutations in ATM and TP53, both not detected in the liver metastases with 10% TC. Another study found that in samples with at least 3% TC, the coverage is acceptable, and genomic alterations with therapeutic potential are identified. They also describe this scenario in samples with at least a surface area of 10 mm225. In this regard, it has been suggested that tumor area might have a greater impact on NGS results than TC percentage26. In this study26, they report that small biopsies with less than 10 mm2 usually had low TC percentage and high NGS fail rate, thus the area being a confounding factor. In this context, digital pathology can be an aid, as it offers a straightforward means of measuring areas, and its results can be easily combined with TC percentage. In this way, we can make a better selection of cases in which to perform NGS.The main limitation of this study is the scarce number of cases. We chose to send only four WSI in the first ring trial due to the learning curve, as most pathologists had no previous experience performing digital analysis. The second ring trial served mainly to gain experience and speed, both of which will be needed as each center will analyze their own 150 cases in the INGENIO project.In view of these results, we can conclude that when digital pathology relies on manual methods it still has some degree of subjectivity. However, with the advent of artificial intelligence solutions, we will be able to face this limitation and support pathologists in their tasks.

Hot Topics

Related Articles