Histopathology imaging and clinical data including remission status in pediatric inflammatory bowel disease

Slide imaging and structured data associated with 18 pediatric IBD patients treated at CHOC from 2014 to 2022 are included in this repository. Each patient was assigned an anonymous identification number. Image deidentification included removing slide labels and macros from each ScanScope Virtual Slide (SVS) image file. Macro images provide a low-resolution snapshot of all sections on the slide. Labels and macros were removed since many were found to contain protected health information (PHI). Patient demographics and IBD diagnoses are listed in Table 2.Table 2 Patient summary statistics.Experimental designAn associated Excel file for this imaging data set contains three sheets, one for patient-level details, a second for image-level details, and a third listing column names and descriptions.Patient-level detailsThe patient sheet contains patient identification number, gender, age in years at biopsy, IBD diagnosis, number of weeks between biopsy encounter and 12-week encounter, 12-week remission status, 12-week steroid name (for patients on steroids at that time), 12-week nutrition status, 12-week mental health risk, number of weeks between biopsy encounter and 52-week encounter, 52-week remission status, 52-week steroid name (for patients on steroids at that time), 52-week nutrition status, number of slides, and number of scans. A 52-week mental health risk variable is not included since all patients were missing values for this variable.Image-level detailsBiopsy pathology results indicating the treating pathologist’s designation (abnormal vs. normal) at the site level are included in the image-level Excel sheet, which lists all section scan images included in this dataset. A normal classification was assigned to a site if there were no significant histopathologic abnormalities. Generally, any “-itis” was classified as abnormal. Exceptions were mild inactive gastritis and reactive esophagitis, which were classified as normal. These findings are not usually IBD related, and there is a high prevalence of these findings in patients at CHOC, even patients without IBD. Images were classified as abnormal gastritis if they included any active/acute gastritis (even focal) or any chronic gastritis with at least subepithelial band-like lymphoplasmacytic infiltration or cluster(s) of >10 plasma cells within the lamina propria. Further parsing (normal vs. abnormal) of the data at the section level is provided for tissue sections from abnormal slides. Not all tissue sections on an abnormal slide were classified as abnormal.Abnormal tissue section images were additionally classified as containing active inflammation (mild, moderate, severe), granuloma, and/or chronic changes/architectural distortion. Mild active inflammation was assigned if the section contained neutrophils within the lamina propria and/or cryptitis. Moderate active inflammation was assigned if the section contained crypt abscesses. Severe active inflammation was assigned if the section contained inflamed granulation tissue, erosion, or ulceration. Chronic changes/architectural distortion was assigned if the section contained branched glands, crypt drop-out, lymphoplasmacytosis within the lamina propria, and/or eosinophilia. Granuloma was assigned if the section contained at least one granuloma.Active inflammation is present in 318 sections, granulomas are present in 18 sections, and chronic changes are present in 272 sections. Active inflammation, granuloma, and chronic changes are not mutually exclusive phenotypes. All tissue sections on a slide come from the same sample. Usually, the 6 sections are taken at 3 different levels about 200 microns apart: 3 sets of 2 consecutive sections. However, although all sections on a slide are from the same tissue sample, not all sections on an abnormal slide are classified as abnormal and not all sections from a single slide are phenotypically identical. For instance, it is possible for some sections on a slide to contain both active inflammation and chronic changes, while the other sections on the same slide only contain chronic changes. Although the tissue appears very similar, using several slices from the same tissue sample is standard pathology practice because differences exist within the sample. Subtle features in the tissue sections make the section level classification tasks interesting and clinically useful. Table 3 lists the number of slides and section scan images classified as normal vs. abnormal, as well as the number of slides and scans associated with remission within normal and abnormal classifications and abnormal phenotypes.Table 3 Slide and section scan classification.Data accessTissue scan images and the associated Excel file are available on the Cell Image Library (CIL) at http://cellimagelibrary.org/pages/Project_2048311. The dataset is licensed under Creative Commons Attribution License CC BY and can be downloaded using the PHP script provided at https://github.com/CRBS/CIL_RS/wiki/Download_CHOC_dataset. The entire dataset is 241.5 GB. Each scan image will be downloaded to a zip file. Once the images are extracted from the zip files, they can be linked to patient and scan attributes listed in the Excel file per a standardized file-naming convention. All scan images have the.tif file extension and are named according to the following:PatientID_SpecimenLetter_BiopsySiteLetters_SectionLetter.tif. Note that patient IDs are not consecutive; there are 18 patients, but patient ID numbers range from 01 to 23. The specimen letter is included to prevent duplicate names in cases where multiple pieces of tissue were taken from the same site. Table 4 lists the biopsy site name and associated letters utilized to name each tissue-section image file. Possible biopsy sites that did not occur in this dataset were duodenal bulb and mid-esophagus. The section letter is representative of the section’s location within the tissue sample. The 6 sections were taken at 3 different levels about 200 microns apart: 3 sets of 2 consecutive section pairs; A and B, C and D, and E and F. For slides with only 3 sections (A, B, and C), each section was taken about 200 microns apart.

Hot Topics

Related Articles