Description and first insights on a large genomic biobank of lung transplantation

Description of the COLT clinical databaseGenCOLT was built as a sub-part of the Cohort in Lung Transplantation (COLT – NCT00980967), whose main focus was the discovery of CLAD risk factors [16, 17]. The ethics committee (Comité de Protection des Personnes, 2009-A00036-51) approved the study and all participants provided a written informed consent (CNIL French data protection authority, #911142). COLT was a prospective study, which included patients from twelve clinical centers between September 2009 to December 2018: CHU Nantes, Hospices Civils de Lyon, Assistance Publique-Hôpitaux de Marseille, Hôpital Foch (Paris), Hôpital Européen Georges Pompidou (Paris), CHU Grenoble, CHRU Strasbourg, CHU Bordeaux, Hôpital Bichat (Paris), Hôpital Marie Lannelongue (Paris area) and CHU Toulouse from France, as well as the Erasme Hospital (Brussels, Belgium). COLT comprises a total of 1413 transplanted patients from whom clinical data and biological samples were collected. The clinical records and follow-up data from each participating center are centralized in a secured online database coordinated by the Nantes University Hospital. According to protocol, follow-up visits were conducted at 1-month and 6-month after transplantation, and then every 6 months for a period of up to 5 years for biological sample (blood) collection, and up to 10 years for clinical data. Clinical follow-up data include functional data such as allograft dysfunction, infection, and pulmonary function test. In addition, every relevant clinical event occurring and all pulmonary function tests performed between two protocol visits (e.g. infection, acute cellular rejection, hospital visit) were recorded in the database and considered by the adjudication committee which includes respiratory physicians from at least five different centers.Clinical phenotypesAll COLT patients underwent individual phenotyping by an adjudication committee that gathered at least 5 investigator physicians from the different participating centres. Pulmonary function tests (last FEV1 values before rejection prognostic are assessed to correlate a degradation of at least 20% FEV1 to a chronic graft dysfunction), relevant chest computed tomography and medical history, especially potential confounding factors, were reviewed for a collegial decision on phenotype initially based on the 2014 proposed classification and then on the 2019 ISHLT consensus report on CLAD [17]. Recipients were classified as follows: BOS, restrictive allograft syndrome (RAS), azithromycin responsive allograft dysfunction (ARAD), non-CLAD (alive 2 years after transplant without CLAD) and other (death within 3 months after transplantation, death without CLAD, insufficient data to conclude or confounding factors).Description of the GenCOLT genetic dataDNA collection and GWAS genotypingGenCOLT is a DNA biobank that was established as an extension of COLT. At this stage, GenCOLT has gathered DNA samples from 392 pairs of LT donors and recipients (n = 784 individuals) across 12 centers. We included all adult patients (age ≥18 years-old) with a survival over 3 months post-transplantation, for whom consent for genetic investigations and a DNA sample were available for both the donor and the recipient. The collection of samples from deceased donors for scientific research has been authorized by the Agence de Biomédecine (PFS09-003). A protocol has been put in place, which includes researching whether or not the donor is opposed to the use of his organs or body parts for research and information for relatives. The GenCOLT cohort is located and managed by the Centre des Ressources Biologiques at the Nantes University Hospital.Each DNA sample was assessed for volume, concentration, purity (260/230 and 260/280 absorbance ratios) and degradation by analyzing migration on an agarose gel. Following the manufacturer recommendations, the DNA samples were normalized to a volume of 20 µL and a concentration of 10 ng/µL. They were randomized based on sex and donor/recipient status in 96-well plates to minimize batch effects. Subsequently, the DNA samples underwent genotyping using the Axiom PMRA (Precision Medicine Research Array) chips (ThermoFisher, Waltham MA, USA), which cover 902,560 genetic variants (or SNPs, single nucleotide polymorphisms) including those found in the HLA and KIR polymorphic genomic regions, as well as other relevant genes for research in cancer and immunology. We followed the Axiom 2.0 Thermofisher standardized protocols and guidelines during the genotyping process.Data processing and imputationsTo ensure the reliability and quality of the genomic data (Fig. 1A), we implemented several essential quality control (QC) steps. First, we performed the technological QC using AxAs (Axiom Analysis Suite) only to retain high-quality individuals, plates, and genetic variants. For individuals, a DishQC >0.82 ensured the proper separation of AT and GC fluorescence signals from noise in nonpolymorphic test probes, and the sample proportion with an assigned genotype (or call rate) was set to >97%. Similarly, we applied a call rate of >98.5% and >95% for plates and genetic variants, respectively. In addition, we evaluated for each SNP the genotype cluster quality assignment (Fisher’s linear discrimination >3.6) and the clear distinction between homozygous and heterozygous clusters (heterozygosity >95%). A total of 852,344 SNPs and 387 pairs (n = 775 individuals) passed this technological QC process.Fig. 1: Building the GenCOLT biobank and robust GWAS SNP data.A Establishment of the GenCOLT DNA cohort. Initially, a total of 784 individuals were selected from the COLT biobank for DNA extraction and GWAS genotyping. After the initial screening, five individuals were excluded due to missing or deteriorated DNA sample. Subsequently, during the Axiom quality control procedure, four individuals were excluded due to failed experiments. Finally, a total of 387 pairs (n = 775 individuals), with DNA sample, accurate genomic and clinical data were included in the GenCOLT biobank. B Steps for GWAS genotyping data cleaning. The Axiom PMRA chip used for GWAS genotyping covers 902,560 SNPs. According to the manufacturer guidelines, 852,344 SNPs passed the primary technological quality controls. Upstream SNP imputation, we excluded SNPs with high level of missingness (>2%), with low frequency (<1%) and not respecting the HWE (p < 10-6). Overall, GenCOLT contains 7.3 million high-quality SNP genotypes (DR or r2 > 0.8) for 387 LT pairs. N.B. R, recipient; D, donor; GWAS, genome-wide association study; QC, quality control; SNP, single nucleotide polymorphism; MAF, minor allele frequency; HWE, Hardy-Weinberg equilibrium.We performed additional QC with PLINK [18] to prevent genotyping errors. We checked individuals with missingness >2% and evaluated the relatedness between samples; however, no individuals were excluded at this stage. For SNPs, we excluded those with missingness >2% and deviations from the Hardy-Weinberg equilibrium (p < 10-6, Fig. 1B). To address missing data, we employed imputation methods using the TOPMED [19] tool and TOPMED reference panel for SNP imputation. For HLA imputation, we employed HIBAG [20] along with a global reference panel (multiethnic samples from 1000 Genomes Project) [21, 22]. In both cases, we only retained SNPs and HLA alleles with high imputation quality (r2 > 0.8).Finally, we carefully compared the imputed sex and HLA one-field alleles with sex and HLA allele information from clinical records to prevent potential errors of sample management during genotyping. As a result, GenCOLT collected robust high-quality imputed GWAS data for 7,337,433 SNPs and including 387 pairs (n = 775 individuals).Genetic ancestryTo further describe GenCOLT, we assessed donors and recipients genetic ancestry using ADMIXTURE [23] and principal component analyses (PCA) from GWAS SNPs with a minor allele frequency (MAF) ≥ 1%. By comparing GenCOLT with the diverse 1000 Genomes Project [24] populations (n = 2,504 individuals from African (AFR), American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS) reference populations), we aimed at detecting potential population stratification and at capturing ancestry-related variability in GenCOLT. ADMIXTURE defined genetic ancestry percentages per individual with a detailed breakdown within the five major ancestral groups (African, American, East Asian, European and South Asian). Each donor and recipient were attributed to an ancestry group when the ancestry percentage was ≥80%. When their ancestry did not meet the criterion of at least 80% contribution from any of the five major ancestral populations, the individuals were classified as admixed.Statistical analysisDescriptive statistics were expressed as mean ± standard deviation and the min-max range for continuous variables, and as percentage for categorical variables. Difference among groups was tested using one-way ANOVA and chi-square tests for categorical variables. Kaplan-Meier analysis was performed to estimate the 5-year and 10-year survival after LT, and the differences in survival rate were compared using a log-rank test. All the analyses were performed in R 4.2.1. Two-sided p-values were considered significant for p < 0.001 to account for multiple testing, and nominally significant for p < 0.05.

Hot Topics

Related Articles