A novel approach to the analysis of Overall Survival (OS) as response with Progression-Free Interval (PFI) as condition based on the RNA-seq expression data in The Cancer Genome Atlas (TCGA) | BMC Bioinformatics

We first provide an overview of regression models used in survival analysis, beginning with the Cox proportional hazards model and traditional regression models, and then extending to shared gamma frailty models.Background: Cox proportional hazards modelThe Cox proportional hazards model [3] is a widely used method in survival analysis that relates the time of an event to covariates without making specific assumptions about the underlying hazard function. The model can be described by the following hazard function:$$\begin{aligned} \lambda (t|x) = \lambda _0(t) \exp (\beta x), \end{aligned}$$where \(\lambda (t|x)\) is the hazard function, \(\lambda _0(t)\) is the baseline hazard, \(\beta\) represents the coefficients, and x represents the covariates.Background: time-varying Cox modelThe Cox model can be extended to handle time-varying/time-dependent effects [20], where covariates change over time or coefficients change over time. The hazard function for a Cox model with time-varying covariates or time-varying coefficients can be written as:$$\begin{aligned} \lambda (t|x(t))= & {} \lambda _0(t) \exp (\beta x(t)), \\ \lambda (t|x)= & {} \lambda _0(t) \exp (\beta (t) x), \end{aligned}$$where x(t) denotes the covariates that change over time, \(\beta (t)\) denotes the coefficients that change over time. This extension allows for more flexibility in modeling scenarios where the risk factors change as time progresses.Background: baseline hazard functions and regression approachTwo baseline hazard functions are considered in the analysis: the exponential distribution and the Weibull distribution (Table 1). In practice, traditional regression model is often used to analysis ordered survival events. The probability density function of the traditional regression model is [10]$$\begin{aligned} f(t_2|x, t_1)=e^{-{t_2}^\delta \lambda e^{\beta x+\gamma log(t_1)}}e^{\beta x+\gamma log(t_1)}\lambda \delta {t_2}^{\delta -1}. \end{aligned}$$where \(\delta\), \(\lambda\), \(\beta\) and \(\gamma\) represent unknown parameter, x represents covariate, \(t_1\) and \(t_2\) represent two survival times.Table 1 Two baseline hazard functions for shared gamma frailty models and regression approachConditional modelsBackground: shared gamma frailty modelsShared frailty models are widely used models in survival analysis [21, 22]. In these models, we assume that the two time components are \(T_1\) and \(T_2\). In shared frailty models, we assume that \(T_1\) and \(T_2\) are conditionally independent given a shared unobservable random effect, namely the frailty Z. We consider that the frailty Z follows a gamma distribution (\(Z\sim \Gamma (\theta ,\theta ), \quad \theta >0\)). Given the gamma frailty Z with the observed covariate x, the conditional hazard function and the conditional cumulative hazard function of \(T_1\) and \(T_2\) are as follow$$\begin{aligned} \lambda _i(t|Z, x)= & {} Z\lambda _{i0}(t)e^{\beta _ix}\qquad i=1,2; \\ \Lambda _i(t|Z, x)= & {} Z\Lambda _{i0}(t)e^{\beta _ix} \qquad i=1,2. \end{aligned}$$where \(\lambda _{10}(t)\) and \(\lambda _{20}(t)\) are baseline hazard functions, \(\Lambda _{10}(t)\) and \(\Lambda _{20}(t)\) are baseline cumulative hazard functions.The conditional survival functions of \(T_1\) and \(T_2\) are as follow$$\begin{aligned} S_1(t|Z, x)= & {} e^{-Z\Lambda _{10}(t)e^{\beta _1x}}; \\ S_2(t|Z, x)= & {} e^{-Z\Lambda _{20}(t)e^{\beta _2x}}. \end{aligned}$$Because \(T_1\) and \(T_2\) are conditionally independent given the gamma frailty Z, the bivariate survival function of \(T_1\)and \(T_2\) given the gamma frailty Z is$$\begin{aligned} S(t_1,t_2|Z, x)=e^{-Z(\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x})}. \end{aligned}$$Then, the bivariate density function of \(T_1\) and \(T_2\) is (The derivation details are described in the Supplementary Materials.)$$\begin{aligned} \begin{aligned} f(t_1,t_2|x)&=\lambda _{10}(t_1)\lambda _{20}(t_2)e^{(\beta _1+\beta _2)x}\frac{(\theta +1){\theta }^{\theta +1}}{(\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x})^{\theta +2}}. \end{aligned} \end{aligned}$$Conditional distributionThe conditional density function of \(T_2\) given \(T_1\) is (The derivation details are described in the Supplementary Materials.)$$\begin{aligned} \begin{aligned} P(T_2=t_2|T_1=t_1, X=x)&=\lambda _{20}(t_2)e^{\beta _2x}\frac{(\theta +1)(\theta +\Lambda _{10}(t_1)e^{\beta _1x})^{\theta +1}}{(\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x})^{\theta +2}}. \end{aligned} \end{aligned}$$In practice, it is often to observe right censored \(T_1\) and \(T_2\). The corresponding conditional probabilities are as follow (The derivation details are described in the Supplementary Materials)$$\begin{aligned}{} & {} \begin{aligned} P(T_2=t_2|T_1 \ge t_1, X=x)&=\lambda _{20}(t_2)\theta e^{\beta _2x}\frac{(\theta +\Lambda _{10}(t_1)e^{\beta _1x})^\theta }{(\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x})^{\theta +1}}; \end{aligned} \\{} & {} \begin{aligned} P(T_2 \ge t_2|T_1=t_1, X=x)&=\frac{(\theta +\Lambda _{10}(t_1)e^{\beta _1x})^{\theta +1}}{(\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x})^{\theta +1}}; \end{aligned} \\{} & {} \begin{aligned} P(T_2 \ge t_2|T_1 \ge t_1, X=x)&=\frac{(\theta +\Lambda _{10}(t_1)e^{\beta _1x})^{\theta }}{(\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x})^{\theta }}. \end{aligned} \end{aligned}$$It can be seen from the above formulas that our models actually do not assume that \(T_1 \le T_2\). However, in practice, it is often to observe \(T_1 \le T_2\). Based on the above relationships, we find that the conditional probability can maintain an increasingly monotone relationship between \(T_1\) and \(T_2\) (regardless of whether \(T_1\) is censored or not), which is commonly observed in practice. For example, in the TCGA datasets, if the tumor recurrence could be delayed (PFI), then the survival probability (OS) could be higher.Model interpretationThe hazard functions of conditional model are as follow$$\begin{aligned}{} & {} \begin{aligned} \lambda (t_2|T_1=t_1, X=x)&=\frac{\partial [-log(\int _{t_2}^{+\infty }P(T_2=t|T_1=t_1)\textrm{dt})]}{\partial t_2}\\ {}&=(\theta +1)\frac{\lambda _{20}(t_2)e^{\beta _2x}}{\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x}}\\&=(\theta +1)\frac{\lambda _{20}(t_2)}{\theta e^{-\beta _2x}+\Lambda _{10}(t_1)e^{(\beta _1-\beta _2)x}+\Lambda _{20}(t_2)}; \end{aligned} \\{} & {} \begin{aligned} \lambda (t_2|T_1 \ge t_1, X=x)&=\frac{\partial [-log(\int _{t_2}^{+\infty }P(T_2=t|T_1 \ge t_1)\textrm{dt})]}{\partial t_2}\\ {}&=\theta \frac{\lambda _{20}(t_2)e^{\beta _2x}}{\theta +\Lambda _{10}(t_1)e^{\beta _1x}+\Lambda _{20}(t_2)e^{\beta _2x}}\\&=\theta \frac{\lambda _{20}(t_2)}{\theta e^{-\beta _2x}+\Lambda _{10}(t_1)e^{(\beta _1-\beta _2)x}+\Lambda _{20}(t_2)}. \end{aligned} \end{aligned}$$It can be seen from the above formulas that the predictive score of evaluating individual hazard is \(\theta e^{-\beta _2x}+\Lambda _{10}(t_1)e^{(\beta _1-\beta _2)x}\). We can obtain the relationship of the individual hazard and covariate x as follow

1.

\(\beta _2 \ge 0,\beta _1 \le \beta _2.\)
The conditional hazard function increases with respect to x;

2.

\(\beta _2 \ge 0,\beta _1 > \beta _2.\)
The conditional hazard function with respect to x increases at \([0,\frac{1}{\beta _1}\log \frac{\beta _2\theta }{\Lambda _{10}(t_1)(\beta _1-\beta _2)}]\) and decreases at \((\frac{1}{\beta _1}\log \frac{\beta _2\theta }{\Lambda _{10}(t_1)(\beta _1-\beta _2)},+\infty ]\);

3.

\(\beta _2 < 0, \beta _1 \ge \beta _2.\)
The conditional hazard function decreases with respect to x;

4.

\(\beta _2< 0,\beta _1 < \beta _2.\)

The conditional hazard function with respect to x decreases at \([0,\frac{1}{\beta _1}\log \frac{\beta _2\theta }{\Lambda _{10}(t_1)(\beta _1-\beta _2)}]\) and increases at \((\frac{1}{\beta _1}\log \frac{\beta _2\theta }{\Lambda _{10}(t_1)(\beta _1-\beta _2)},+\infty ]\).Clearly, censored \(t_1\) does not affect the relationship above.Theoretical resultsWe have proven the five lemmas required for our models in the Supplementary Materials. In \({\textbf {Lemma\;3}}\) and \({\textbf {Lemma\;4}}\), we generalize two baseline hazard functions to any positive function and generalize the frailty distribution to the power-variance-function (PVF) family [11].Likelihood-based estimationWe assume that the sample data were \((t_{1i},t_{2i},\delta _{1i},\delta _{2i},x_i \quad i=1 \dots n)\), where \(\delta _{1i}=0\) represents censored \(t_{1i}\), \(\delta _{1i}=1\) represents uncensored \(t_{1i}\), \(\delta _{2i}=0\) represents censored \(t_{2i}\) and \(\delta _{2i}=1\) represents uncensored \(t_{2i}\).In the exponential-baseline-based model, the baseline hazard functions are$$\begin{aligned} \lambda _{10}(t_1)=\lambda _1,\lambda _{20}(t_2)=\lambda _2,\Lambda _{10}(t_1)=\lambda _1t_1,\Lambda _{20}(t_2)=\lambda _2t_2, \quad \lambda _1>0,\lambda _2>0. \end{aligned}$$The likelihood function \(L(\theta ,\lambda _1,\lambda _2,\beta _1,\beta _2)\) is$$\begin{aligned} \begin{aligned} \prod _{i=1}^n&(\lambda _{2}e^{\beta _2x_i}\frac{(\theta +1)(\theta +\lambda _1t_{1i}e^{\beta _1x_i})^{\theta +1}}{(\theta +\lambda _1t_{1i}e^{\beta _1x_i}+\lambda _2t_{2i}e^{\beta _2x_i})^{\theta +2}})^{\delta _{1i}\delta _{2i}}(\lambda _{2}e^{\beta _2x_i}\theta \frac{(\theta +\lambda _{1}t_{1i}e^{\beta _1x_i})^\theta }{(\theta +\lambda _{1}t_{1i}e^{\beta _1x_i}+\lambda _{2}t_{i2}e^{\beta _2x_i})^{\theta +1}})^{(1-\delta _{1i})\delta _{2i}}\\&(\frac{(\theta +\lambda _{1}t_{1i}e^{\beta _1x_i})^{\theta +1}}{(\theta +\lambda _{1}t_{1i}e^{\beta _1x_i}+\lambda _{2}t_{2i}e^{\beta _2x_i})^{\theta +1}})^{\delta _{1i}(1-\delta _{2i})}(\frac{(\theta +\lambda _{1}t_{1i}e^{\beta _1x_i})^{\theta }}{(\theta +\lambda _{1}t_{1i}e^{\beta _1x_i}+\lambda _{2}t_{2i}e^{\beta _2x_i})^{\theta }})^{(1-\delta _{1i})(1-\delta _{2i})}. \end{aligned} \end{aligned}$$In the Weibull-baseline-based model, the baseline hazard functions are$$\begin{aligned} \lambda _{10}(t_1)= & {} \lambda _1\rho _1 {t_1}^{\rho _1-1},\lambda _{20}(t_2)=\lambda _2\rho _2 {t_2}^{\rho _2-1},\Lambda _{10}(t_1)=\lambda _1{t_1}^{\rho _1},\Lambda _{20}(t_2)=\lambda _2{t_2}^{\rho _2}, \\{} & {} \lambda _1>0,\lambda _2>0,\rho _1>0,\rho _2>0. \end{aligned}$$The likelihood function \(L(\theta ,\lambda _1,\lambda _2,\beta _1,\beta _2,\rho _1,\rho _2)\) is$$\begin{aligned} \begin{aligned} \prod _{i=1}^n&(\lambda _{2}\rho _2 {t_{2i}}^{\rho _2-1}e^{\beta _2x_i}\frac{(\theta +1)(\theta +\lambda _1{t_{1i}}^{\rho _1}e^{\beta _1x_i})^{\theta +1}}{(\theta +\lambda _1{t_{1i}}^{\rho _1}e^{\beta _1x_i}+\lambda _2{t_{2i}}^{\rho _2}e^{\beta _2x_i})^{\theta +2}})^{\delta _{1i}\delta _{2i}} \\&(\lambda _{2}\rho _2 {t_{2i}}^{\rho _2-1}e^{\beta _2x_i}\theta \frac{(\theta +\lambda _{1}{t_{1i}}^{\rho _1}e^{\beta _1x_i})^\theta }{(\theta +\lambda _{1}{t_{1i}}^{\rho _1}e^{\beta _1x_i}+\lambda _{2}{t_{i2}}^{\rho _2}e^{\beta _2x_i})^{\theta +1}})^{(1-\delta _{1i})\delta _{2i}} \\&(\frac{(\theta +\lambda _{1}{t_{1i}}^{\rho _1}e^{\beta _1x_i})^{\theta +1}}{(\theta +\lambda _{1}{t_{1i}}^{\rho _1}e^{\beta _1x_i}+\lambda _{2}{t_{2i}}^{\rho _2}e^{\beta _2x_i})^{\theta +1}})^{\delta _{1i}(1-\delta _{2i})}\\&(\frac{(\theta +\lambda _{1}{t_{1i}}^{\rho _1}e^{\beta _1x_i})^{\theta }}{(\theta +\lambda _{1}{t_{1i}}^{\rho _1}e^{\beta _1x_i}+\lambda _{2}{t_{2i}}^{\rho _2}e^{\beta _2x_i})^{\theta }})^{(1-\delta _{1i})(1-\delta _{2i})}. \end{aligned} \end{aligned}$$The maximum likelihood estimators asymptotically follow normal distributions with mean being the true value of the parameters and covariance matrix being the inverse of the n-fold Fisher information matrix by the \({\textbf {Lemma\;1}}\) (In the Supplementary Materials). We calculated the maximum likelihood estimates by the \({\textbf {optim}}\) function in R or the NLMIXED Procedure in SAS.Variance and confidence intervalIn Section Likelihood-based Estimation, we have stated that the estimated variances of the maximum likelihood estimates of \(\beta _1\) and \(\beta _2\) can be calculated by the inverse of the n-fold Fisher information matrix and we can calculate the estimated variances of the maximum likelihood estimates of \(\beta _1\) and \(\beta _2\) by the \({\textbf {optim}}\) function in R or the NLMIXED Procedure in SAS. The maximum likelihood estimates of \(\beta _1\) and \(\beta _2\) asymptotically follow normal distributions. Therefore, the \(95\%\) confidence interval of \(\beta _1\) and \(\beta _2\) are as follow: assume the estimated standard deviations of \(\beta _1\) and \(\beta _2\) are respectively \(\hat{s_1},\hat{s_2}\), the estimated n-fold Fisher information matrix is \({\hat{H}}\) and the estimation of \(\beta _1\) and \(\beta _2\) are respectively \(\hat{\beta _1},\hat{\beta _2}\), the \(95\%\) confidence interval of \(\beta _1\) and \(\beta _2\) are respectively$$\begin{aligned} {[}\hat{\beta _1}-1.96*\hat{s_1},\hat{\beta _1}+1.96*\hat{s_1}]=[\hat{\beta _1}-1.96*\sqrt{{\hat{H}}^{-1}(1,1)},\hat{\beta _1}+1.96*\sqrt{{\hat{H}}^{-1}(1,1)}]; \\ {[}\hat{\beta _2}-1.96*\hat{s_2},\hat{\beta _2}+1.96*\hat{s_2}]=[\hat{\beta _2}-1.96*\sqrt{{\hat{H}}^{-1}(2,2)},\hat{\beta _2}+1.96*\sqrt{{\hat{H}}^{-1}(2,2)}]. \end{aligned}$$Hypothesis testingWhen we consider \(T_1\), the relationship between \(T_2\) and the covariate x is not only related to \(\beta _2\), it may also be related to \(\beta _1\). According to \({\textbf {Lemma\;2}}\) (In the Supplementary Materials), \(\beta _1=\beta _2=0\) is equivalent to that \(T_2\) is independent of the covariate x given \(T_1\) in three shared frailty models.Therefore, we can consider the following hypothesis test$$\begin{aligned} H_0:\beta _1=\beta _2=0. \quad vs \quad H_1:\beta _1\ne 0 \quad or \quad \beta _2\ne 0. \end{aligned}$$For the above hypothesis test, we used the likelihood ratio test.The likelihood ratio statistics are as follow$$\begin{aligned} \Lambda= & {} \frac{\sup _{(\theta ,\lambda _1,\lambda _2,\beta _1,\beta _2)\in R^5}L(\theta ,\lambda _1,\lambda _2,\beta _1,\beta _2)}{\sup _{(\theta ,\lambda _1,\lambda _2,\beta _1,\beta _2)\in (R,R,R,0,0)}L(\theta ,\lambda _1,\lambda _2,\beta _1,\beta _2)}; \\ \Lambda= & {} \frac{\sup _{(\theta ,\lambda _1,\lambda _2,\rho _1,\rho _2,\beta _1,\beta _2)\in R^7}L(\theta ,\lambda _1,\lambda _2,\rho _1,\rho _2,\beta _1,\beta _2)}{\sup _{(\theta ,\lambda _1,\lambda _2,\rho _1,\rho _2,\beta _1,\beta _2)\in (R,R,R,R,R,0,0)}L(\theta ,\lambda _1,\lambda _2,\rho _1,\rho _2,\beta _1,\beta _2)}; \end{aligned}$$According to \({\textbf {Lemma\;5}}\) (In the Supplementary Materials), the above likelihood ratio test statistics have asymptotical distributions, i.e. the double log-likelihood ratio test statistics asymptotically follow chi-square distribution with two degrees of freedom (\(\chi ^2(2)\)). Therefore, we test the covariate x by the likelihood ratio test. The above likelihood ratio test statistics \(\Lambda\) can be calculated using the \({\textbf {optim}}\) function in R or the NLMIXED Procedure in SAS.Single-sample gene set enrichment analysis (ssGSEA)We utilized the R package \({\textbf {GSVA}}\) to perform ssGSEA [16] at the individual sample level, using pathways as gene sets. The pathway information was obtained from the Molecular Signatures Database (MSigDB) [23, 24].Kaplan–Meier (K–M) curveWe utilized the R package \({\textbf {survival}}\) to create a Kaplan–Meier (K–M) [25] curves. The K–M curves estimates and visualizes the survival probability over time.Time-dependent receiver operating characteristic (ROC)We use the time-dependent ROC methodology (a modified version of the conventional ROC methodology) to evaluate the predictive accuracy of our models [17,18,19]. The areas under the time-dependent ROC curves (AUC(t)) are often used to compare the predictive accuracy of several prediction models. Since Heagerty [17, 18] proposed the time-dependent ROC methodology and some definitions of cases and controls, there are many methods to estimate the time-dependent ROC curve. We choose a nonparametric estimator of the time-dependent ROC curve using the inverse probability of censoring weighting (IPCW) approach [19]. We plot AUC(t) of IPCW estimators by the R package \({\textbf {timeROC}}\).

Hot Topics

Related Articles