Detecting HLA loss of heterozygosity within a standard diagnostic sequencing workflow for prognostic and therapeutic opportunities

Tempus xT CDx sequencingTempus AI, Inc. (“Tempus”) provides genomic testing for patients and physicians to guide treatment options and to identify patients eligible for clinical trials. The Tempus xT CDx assay is an FDA-approved (P210011) next-generation sequencing (NGS) in vitro diagnostic device that targets 648 actionable oncogenes24. The laboratory procedures are carried out in Tempus’ College of American Pathologists (CAP)-accredited and Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory.The device uses solid tumor DNA extracted from formalin-fixed, paraffin-embedded (FFPE) tumors and matched normal specimens (saliva or blood) obtained from previously diagnosed patients with solid malignant neoplasms. Total nucleic acid is extracted from FFPE tumor slides and matched to normal blood or saliva. Slides are evaluated by pathologists and microdissected to meet a baseline requirement of 20% tumor cellularity. Nucleic acid is extracted (minimum extraction yield of 50 ng) using the Chemagic360 instrument, stored at -80°C, and quantified. Specimens are sonicated to achieve 200 base pair-sized fragments using a Covaris LE220 followed by library construction (KAPA Hyper Prep Kit (List KK8504)). If applicable, a bead-based size selection step is instituted to enrich target sequences at the library construction step. After elution from the magnetic bead cleanup, the control and study specimen libraries are evaluated. A minimum library yield of 175 ng is required as input into the hybridization step. Isolation of captured target sequences is performed using the xGen Hybridization and Wash Kit (List 1080577 or 1080584). The captured target sequences are amplified using the KAPA Library Amplification Kit (List KK2621 or KK2620). Specimens are then processed through a post-capture cleanup using the Axygen AxyPrep Mag PCR Clean-Up Beads (List MAGPCR-CL-250). A minimum molarity of 2 nM and a peak size of 200–800 base pairs of the post-capture library is required as input into sequencing. Specimens are sequenced on the Illumina® NovaSeq 6000 sequencing platform to > 500x median coverage of tumor samples with > 95% of exons at > 150x coverage and ≥98% of exons at ≥ 100x coverage. The quality of the sequencing data is evaluated post-sequencing to ensure minimum coverage and exclude poor-quality samples as well as samples with potential contamination or swaps. Sequence data are then processed using a customized analysis pipeline designed for use in the detection of substitutions (single-nucleotide variants [SNVs] and multi-nucleotide variants [MNVs]), and insertion and deletion alterations (INDELs).The HLA LOH assay and bioinformatics pipelineAn overview of the HLA LOH assay workflow is presented in Fig. 1a. This assay is an extension of the Tempus xT CDx functionality to detect LOH events in clinically actionable and biologically relevant HLA alleles for clinical trial enrollment and investigational use. The assay is capable of detecting LOH in HLA-A, HLA-B, and HLA-C; however, the validation study and downstream analyses presented here are exclusively in HLA-A. Following Tempus xT CDx sequencing, reads mapping to the larger HLA region and all unmapped reads are selected. A custom HLA reference is generated using the sample’s germline genotype (provided as input), obtained from a matching blood or saliva sample. The sample’s germline genotype is determined using the open-source tool Optitype36, which displays an accuracy ≥95% both in benchmark studies and our own experiments. Candidate reads are aligned to the custom reference. The alignment of the germline to normal sample is used to verify the HLA genotype provided as input. If any variants with >40 supporting reads and a variant allele fraction superior to 75% are detected, the input genotype is considered to be incorrect and the test will return an error. In this validation study, however, there were no instances of incorrect input genotypes and thus no errors were returned. Once the genotype is confirmed, a strict read filtering is applied on both the tumor and germline alignments, where only correctly paired reads mapping exclusively to one allele with an edit distance of 0 are conserved. BEDTools37 is then used to determine the read depth at every position for each allele. The panel of normals is applied by replacing the germline read depth of the sample with the median read depth of samples with the same genotype in our panel. The features are derived from read depth at each genomic position and for each allele the log ratio of the read depth in the tumor and normal sample and the B-allele fraction (BAF) are calculated.Only positions with >40X read depth in both alleles in both tumor and normal samples are sufficient and considered to be “high-coverage”, which are used to generate the sample-level features. Samples with fewer than 300 high-coverage positions do not receive a classification due to low coverage. The sample-level features are the median BAF ratio (ratio of the BAF in the tumor to the BAF in the normal), the median of the difference between the log ratio of the read depth (logR diff), and the difference between logR diff to the expected value of that feature at the given tumor purity (ratio expected difference).Once features are generated, two logistic regression models are applied. The subclonal LOH detection model takes the BAF of the non-targeted allele, the difference between the log ratios, and tumor purity as input. Samples with a probability of allelic imbalance inferior to 55% are classified as stable. The remaining samples are then fed into the clonal model. The clonal LOH detection model takes the BAF of the targeted allele, ratio expected difference, and tumor purity as input. Samples with a probability of clonal allelic imbalance inferior to 50% are classified as subclonal allelic imbalance; samples with a probability of clonal allelic imbalance superior to 50% are classified as clonal allelic imbalance. Both models were trained on manually labeled data. Finally, the algorithm determines whether a clonal allelic imbalance is an LOH or an amplification.Once segmentation of the genome has been performed and copy number values have been assigned to every segment using our proprietary genome-wide copy number variation (CNV) calling algorithm, the segment overlapping with the HLA locus is selected. If this segment is determined to be a gain (major > 1) and does not present signs of LOH (minor > 0), the sample is classified as having an amplification of the HLA locus. Otherwise, the subclonal or clonal loss call is maintained. If no segment fully overlaps with the HLA locus, we use the gain/loss status of the segment either to the left or the right of the HLA locus to determine whether this locus shows signs of allelic imbalance (major =/= minor). If both segments to the right and left show no signs of allelic imbalance, the loss call is maintained.Real-world cohort selection and feature generation to assess prognostic relevance of HLA LOHDe-identified records from a cohort of patients who received Tempus tissue-based NGS testing (n = 256) were selected from the Tempus Database (Tempus AI, Inc., Chicago, IL; Supplementary Table 1). This study was conducted on de-identified health information subject to an institutional review board-exempt determination (Advarra Pro00072742) and did not involve human subjects research. Patient records collected between 2016-2023 were selected for inclusion in the study based on the following criteria: (1) a diagnosis of stage IV non-small cell lung cancer (NSCLC) with either squamous cell carcinoma or adenocarcinoma histology; (2) received an FDA-approved immune checkpoint blockade (ICB)-containing regimen in the first, second, or third line of therapy; (3) regimen start date was on or after sample collection; (4) computationally assessed tumor purity ≥40%; (5) tumor sample did not have any somatic pathogenic mutations identified in EGFR or ALK; (6) a call of ‘clonal loss’ or ‘stable’ was made at the HLA-A locus. Patients with partial loss calls were excluded. The subset of these records with a consistent call of ‘clonal loss’ or ‘stable’ at all three classical class I HLA loci (HLA-A, -B, and -C) was also evaluated separately. Tumor mutational burden (TMB) was calculated for each sample as previously described38.RNAseq data generated using the Tempus xR whole-transcriptome RNA sequencing platform was used as the source of RNAseq data for this cohort39. Gene-level transcripts per million reads (TPM) values were used to compute cytotoxicity scores as previously described40,41.Real-world overall survival analysisSurvival analysis was performed in Python (v3.7.11) using the ‘lifelines’ package (v0.27.742). Real-world overall survival (rwOS) was assessed for up to 3 years following first-line ICB initiation. Cox proportional hazards (CoxPH) or Kaplan-Meier (KM) models were fit to rwOS within each cohort with HLA status (LOH or stable) as the independent variable using the risk set adjustment method43. To account for differences in baseline hazards between treatment regimens and lines of therapy, stratified CoxPH models were used40. Regimens were grouped into classes of ‘ICB monotherapy’ or ‘ICB + chemotherapy’. No differences in LOH prevalence were detected between patients who received ICB in the first line vs. later lines of therapy (Fisher’s exact p = 0.13, Supplementary Table 2).Study patientsBASECAMP-1 (NCT04981119) is an observational study to determine germline HLA genotypes and screen for tumor-associated LOH. Subjects screened at 8 BASECAMP-1 sites using the Tempus AWARE program were clinically tested using Tempus xT and met the following clinical parameters: germline sample available for LOH analysis, age ≥ 18, and unresectable/metastatic colorectal, lung, or ovarian cancer or any stage pancreatic cancer or mesothelioma. BASECAMP-1-participating patients with LOH of HLA-A*02 can be considered for the Phase 1 clinical trial EVEREST-1 (NCT05736731). BASECAMP-1 and EVEREST-1 are approved by individual institutional review boards at each study site. Patients provided written informed consent for both studies and for AWARE participation as a part of testing.Contrived sample method and QC for cell line simulated samples in an accuracy studyCell lines were purchased from the Fred Hutchinson International Histocompatibility Working Group Cell and Gene Bank as purified genomic DNA and detailed in Supplementary Table 5. The HLA-A*02:01 cell lines were represented by IHW09287, IHW09046, IHW09031, IHW09039, IHW09004, IHW09056, IHW09062, IHW09059, IHW09084, IHW09036, IHW09058, IHW09070, IHW09068, and IHW09052. Each of the HLA-A*02:01 cell lines (cell line 1) was mixed with two different non-HLA-A*02:01 cell lines (cell line 2) separately to make two cell line pairs per HLA-A*02:01 cell line. Each cell line was mixed at a 1:1 mass ratio to simulate an expected HLA stable specimen (i.e., negative control). These cell lines were then mixed at five different mass ratios to simulate LOH signal consistent with clinical specimens across tumor percentages ranging from 20%-90%. These specimens were mixed such that the relative percentage of DNA input for the HLA-A*02:01 cell line and the non-HLA-A*02:01 cell lines, respectively, were 44.4% and 55.6% to simulate LOH of a tumor specimen with 20% purity; 37.5% and 62.5% to simulate LOH of a tumor with 40% purity; 28.6% and 71.4% to simulate LOH of a tumor with 60% purity; 16.7% and 83.3% to simulate LOH of a tumor with 80% purity; and 9.1% and 90.9% to simulate LOH of a tumor with 90% purity.To perform cell line contrived quality control (QC) calculations, germline variants (determined by the Tempus xT CDx device) shared among samples contrived under different mixture ratios were selected. Variants with a VAF standard deviation of <5% were removed because they were homozygous or heterozygous in both cell lines and not informative in determining the intended proportion of each cell line.For each remaining variant, a line of best fit was calculated using the target proportion of cell line 1 (p) as the independent variable and the observed VAF as the dependent variable. The slope was used to determine the variant’s zygosity status in both cell lines. Variants with a slope within 0.15 of the expected slope were labeled as one of the four categories: heterozygous in cell line 1 and homozygous in cell line 2 (VAF = 1–0.5p, expected slope = -0.5), homozygous in cell line 1 and heterozygous in cell line 2 (VAF = 0.5 + 0.5p, expected slope = 0.5), only homozygous in cell line 1 (VAF = p, expected slope = 1), only homozygous in cell line 2 (VAF = 1-p, expected slope = -1). Variants with other zygosity combinations could not be annotated as germline in the Tempus xO variant calling pipeline and thus were not considered.After determining the variant’s zygosity, the observed proportion was calculated using the observed VAF and the corresponding equation based on the variant’s category. The observed proportion of cell line 1 in the admixture was calculated as the mean of observed proportions from all identified variants. Samples passing this contrived sample QC had an observed proportion within 5% of the target proportion.28 samples were excluded using this method. Of those 11 were true positive samples, 9 were true negative samples, 5 were false negative samples, 1 was a false positive sample and 2 were excluded from the study for other quality reasons.

Hot Topics

Related Articles