Postprogression therapy and confounding for the estimated treatment effect on overall survival in phase III oncology trials
•,,,,,,,,,,,,,.
...
Abstract
Objective Estimations of the treatment effect on overall survival (OS) may be influenced by post-progression therapies (PPTs). It is unclear how often OS analyses account for PPT effects. The purpose of this cross-sectional analysis was to determine the prevalence of OS analyses accounting for PPT effects in phase III oncology trials.
Methods and analysis We screened two-arm, superiority design, phase III, randomised, oncology trials reporting OS from ClinicalTrials.gov. The primary outcome was the frequency of OS analyses adjusting for PPT confounding. Logistic regressions computed ORs for the association between trial-level covariates and the outcome.
Results A total of 334 phase III trials enrolling 265 310 patients were included, with publications between 2004 and 2020. PPTs were reported in 47% of trials (157 of 334), and an analysis accounting for PPTs was performed in only 12% of trials (N=41). PPT adjustments were often prespecified (N=23, 56%), and appeared to be more likely in cross-over studies (OR 5.04, 95% CI 2.42 to 10.38) and studies with discordant surrogate-OS findings (OR 2.26, 95% CI 1.16 to 4.38). In key subgroup analyses, PPT analyses were infrequent, including 8% of trials among those studying locoregional/first-line therapy and 11% of trials among those powered for OS.
Conclusions Although time on PPTs is an important component of OS, PPTs are rarely considered in OS analyses, which may introduce confounding on estimates of the treatment effect on OS. PPTs and methods to account for their effects on OS estimates should be considered at the time of trial design and reporting.
What is already known on this topic
Following progression on a clinical trial, patients often receive non-protocol-specified treatments at the discretion of their oncologist, which are associated with observational selection biases. Differential efficacy and differential receipt of postprogression therapies between arms may introduce confounding between arms in estimating treatment effects on overall survival.
What this study adds
This large-scale cross-sectional study of phase III randomised controlled trials found that postprogression therapies were reported in a minority of clinical trials. Attempts to adjust for confounding on overall survival estimates from postprogression therapies were rare.
How this study might affect research, practice or policy
There may be confounding in the overall survival estimates of phase III oncology trials. Regulatory agencies and oncologists should consider confounding effects in the treatment estimates of overall survival from postprogression therapies when interpreting current clinical trials. Future trials should consider prespecification of methods to account for postprogression therapy confounding at the time of trial design.
Introduction
Overall survival (OS) is defined in phase III trials as the time from randomisation until death or censoring. Compared with endpoints that are used as surrogates for OS, such as progression-free survival (PFS), OS is often considered more meaningful to patients.1 Surrogate endpoints such as PFS can be subject to a number of biases, including informative censoring and measurement biases; moreover, several groups have also found that surrogate endpoints often have only limited correlations with OS.2–6 For these and other reasons, OS is typically recommended as the endpoint for use in regulatory approvals in phase III trials.7–10
Unlike progression-based surrogate endpoints, however, estimates of the treatment effect on OS may be influenced by treatments received after progression, aka postprogression therapies (PPTs), since disease progression prompting the initiation of PPT is generally considered an event for surrogate endpoints such as PFS.11–13 In some cases, PPTs have been shown to suggest an OS benefit in randomised trials of clinically inert or harmful drugs (in which patients in the ineffective experimental arm received more effective PPTs, compared with patients in the control arm who crossed over to the ineffective experimental therapy following progression).14 15 Not accounting for PPTs can also produce the opposite effect and lead to underestimation of the OS benefit, for example, if patients in the less effective arm cross-over to the more effective therapy after progression.16 Thus, PPTs may have a considerable impact on estimating the treatment effect on OS and subsequent interpretations for clinical practice (figure 1).17–19 Because the scope of this concern in oncology is poorly understood, the present empirical study was undertaken to evaluate the frequency of phase III trials attempting to account for PPT effects in OS analyses.
Postprogression therapy effects on estimates of the treatment effect on overall survival. Created using Biorender.com. RCT, randomised controlled trial.
Methods
A cross-sectional analysis of published, phase III, randomised, controlled, oncology trials was conducted by screening ClincialTrials.gov in February 2020 without date limitations.20Key inclusion criteria were (1) interventional anticancer therapy being tested, (2) a two-arm superiority design and (3) the reporting of matured OS findings as a primary or secondary endpoint (online supplemental figure 1). The maturation of OS findings was defined according to each trial. This study complies with the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.21 Data were curated from all available publications (manuscript reporting the primary endpoint as well as any updated analyses, including all supplemental data through February 2024), ClinicalTrials.gov, and the trial protocol (English only), if available, by a single reviewer. For simplicity, surrogate endpoints such as disease-free survival, event-free survival and time to progression were collectively referred to as PFS, which is the most commonly used surrogate survival endpoint.
The primary objective of this study was to evaluate the frequency of trials attempting to statistically account for PPT confounding in an analysis of OS. Any statistical attempts to account for PPTs were considered; several examples include censoring the OS analysis at the time of PPT initiation, inverse probability of censoring weighting, and rank-preserving structural failure time models.22 Whether any PPT adjustments were prespecified versus post hoc was also recorded; in the absence of a published protocol or statistical analysis plan, it was assumed that analyses not clearly defined as prespecified were post hoc.
Because PPT confounding is present when salvage therapies are available, a sensitivity analysis was performed for trials evaluating the neoadjuvant, definitive, adjuvant, maintenance, first-line metastatic solid tumour or first-line haematological malignancy settings. A second sensitivity analysis was performed including only trials powered for OS with a primary or coprimary endpoint of OS.
Continuous variables were summarised by median and IQR, and categorical variables were summarised by frequency. The trends of PPT adjustments over time were evaluated by ordinary least squares linear regression, in which the slope (m) of the regression represented the rate of change. Univariable binary logistic regression models examined associations of trial-level variables with the outcome of any PPT adjustment to calculate ORs and 95% CIs. Multivariable analysis was not performed due to low numbers of the event of interest. All tests were two sided. Missing data were not encountered for this analysis of public data. The significance (α) level was set at 0.05. Analyses were performed by using SAS V.9 and plots were created using Prism V.10 (GraphPad, La Jolla, California, USA).
Patient and public involvement
Patients were not involved as contributors in this study.
Results
After screening 785 phase III, randomised trials, a total of 334 therapeutic, interventional, two-arm, superiority-design trials enrolling 265 310 patients were included (online supplemental figure 1). Primary endpoint publication dates ranged from 2004 to 2020. The majority of trials evaluated metastatic solid tumours (N=218, 65%) (table 1). OS was the primary or coprimary endpoint of 168 trials (50%). Most trials were open-label (N=206, 62%). Of the 128 double-blind studies, unblinding occurred at the time of progression in 20 trials or at the time of an interim analysis/database lock in 27 trials. Unblinding procedures were not specified in the available trial documentation for 66 trials. In 46 trials, cross-over was allowed, but not required, and in 3 trials, cross-over was required, resulting in a total of 49 cross-over trials (15%). Of these 49 trials, OS was the primary endpoint in 7 trials (with a placebo control arm in 5 of the 7 trials) (online supplemental table 1). In 41 trials, cross-over was prohibited. Rates of cross-over from the control arm to the experimental therapy were reported in 42 trials; in these trials, a median of 55% of eligible patients in the control arm crossed over to the experimental therapy after progression (IQR 37%–76%).
Table 1
|
Characteristics of trials accounting for postprogression therapy (PPT) bias on overall survival estimates
The overall interpretation of PFS and OS was discordant in 107 trials (32%) (eg, superior PFS with no difference in OS). Of trial-level characteristics studied, cross-over rate was most associated with discordance between PFS and OS findings. Of 42 trials reporting cross-over rates, 16 trials had discordant PFS and OS findings. Higher rate of cross-over to the experimental therapy was associated with an increased odds of discordance between PFS and OS (median cross-over rate for discordant trials 74% vs 47% for concordant trials; per each additional percentage point of cross-over, OR 1.03, 95% CI 1.01 to 1.07, p=0.03).
Overall, PPTs were reported in any format by 157 trials (47%). However, only 12% of trials (41 of 334 trials) reported an OS analysis adjusted for PPTs. The most typical approach for addressing PPT confounding was simple censoring at the start of PPT (n=19). Nine trials used multiple approaches to account for PPT impact on OS (table 2). At least one PPT analysis was preplanned in 56% of trials (N=23), with the remaining trials using only post hoc analyses (N=18 trials, 44%). There was no discernable increase over time in trials attempting to account for PPT (m=0.02, 95% CI −1.29 to 1.33, p=0.97) (online supplemental figure 2). Trials with cross-over were more likely to account for PPT confounding (33% vs 9%, OR 5.04, 95% CI 2.42 to 10.38, p<0.0001). Of trials with PPT analyses, there was no significant association between cross-over trials and prespecification of PPT analyses (50% vs 60%, OR 0.67, 95% CI 0.18 to 2.37, p=0.53). Furthermore, trials with a discordance between PFS and OS were more likely to account for PPT confounding (19% vs 9%, OR 2.26, 95% CI 1.16 to 4.38, p=0.02). PPT analyses conducted by trials with discordant findings were not more likely to be post hoc compared with trials with concordant findings (40% vs 48%, OR 0.73, 95% CI 0.21 to 2.53, p=0.62). There was no significant association between the use of OS as a primary endpoint, use of double-blinding or trial sponsor and PPT adjustment (table 1).
Table 2
|
Description of methods used to account for postprogression therapy (PPT) effects on overall survival
PPTs may be more readily available and/or more effective after the first instance of progression compared with subsequent progressions. Thus, a sensitivity analysis was performed on 187 trials evaluating localised, first-line metastatic and first-line haematological settings (excluding second-line or later metastatic or haematological trials). The characteristics of these 187 trials were similar to the full analysis (online supplemental table 2). Only 8% of trials (N=15) attempted to account for PPT, which was also consistent with the primary analysis. Trials with cross-over were similarly associated with increased odds of PPT analysis (21% vs 5%, OR 5.41, 95% CI 1.81 to 16.55, p=0.002), although the association of trials with PFS-OS discordance was reduced (12% vs 6%, OR 2.06, 95% CI 0.66 to 6.07, p=0.19).
A second sensitivity analysis was performed evaluating 168 trials powered for OS differences, with OS as the primary or coprimary endpoint (ie, excluding trials where OS was a secondary endpoint). Trial characteristics were largely similar to the overall analysis, except for a decreased representation of cross-over studies (online supplemental table 3). In this subset of trials, only 11% of trials evaluated PPT confounding (N=18), reproducing the overall findings. However, the association of trial covariates and PPT confounding analyses was decreased, including for cross-over trials (21% vs 10%, OR 2.53, 95% CI 0.53 to 9.23, p=0.19) and trials with PFS-OS discordance (20% vs 7%, OR 1.13, 95% CI 0.37 to 3.10, p=0.82).
Discussion
In this large-scale analysis of contemporary two-arm superiority-design phase III oncology trials with matured OS endpoints, 88% of trials did not attempt to account for confounding introduced by PPTs. These findings were consistent across sensitivity analyses for trials evaluating definitive/first-line therapy and trials with a primary (including coprimary) endpoint of OS. Taken together, this study suggests that the effects of PPTs on estimates of the treatment effects on OS may be under-evaluated in phase III oncology studies.
The implications of this study may impact the interpretations of OS from current phase III oncology trials. In clinical scenarios with salvage PPTs, rather than considering OS as the time from randomisation to death, OS may be better conceptualised as a composite measure summing multiple landmark times: the time from randomisation to progression plus the subsequent time(s) on PPT(s) to death (ie, postprogression survival). Importantly, postprogression survival is affected by both the efficacy of PPTs and potential confounding bias, similar to the bias in observational studies (figure 1).11 Time on PPTs is thus highly relevant to OS in most oncology trials. Time on PPTs may be especially pertinent for trials evaluating locoregional malignancy or trials evaluating first-line therapy, where effective salvage therapies may result in longer postprogression survival times compared with rapidly fatal cancers.23–26 Not accounting for PPT confounding may result in underestimation or overestimation of OS differences between arms of a randomised controlled trial with conventional comparisons of the upfront randomisation.14 27 If patients in the control arm cross-over to the more effective experimental therapy following progression, the impact of the experimental therapy on OS may be underestimated due to overestimation of the OS of the control arm.22 For example, in the PROFILE 1014 trial, the observed OS benefit was weak (HR 0.76, 95% CI 0.5 to 1.05, p=0.10), with 84% of patients in the control arm crossing over to crizotinib (the experimental therapy) after progression.28 However, a prespecified rank-preserving structural failure time model to account for cross-over effects revealed a stronger OS signal in favour of the experimental arm (HR 0.35, 95% bootstrap CI 0.08 to 0.72). Conversely, if patients on the control arm systematically receive less effective PPTs, the control arm may have underestimated OS, resulting in an overestimation of the OS benefits of the experimental arm.29 For example, the ADAURA trial found that adjuvant osimertinib was associated with better OS compared with placebo after resection of non-small cell lung cancer.30 However, only 38.5% of eligible patients in the placebo arm received osimertinib at the time of progression, despite the established efficacy of osimertinib for first-line metastatic disease, which may have biased OS comparisons in favour of the experimental arm.31 32 Other examples have been reviewed in detail elsewhere, such as the ANNOUNCE trial.11 33 Due to the biases introduced by uncontrolled PPTs, some contend that cross-over should be protocol driven—either required (when efficacy has been demonstrated in later lines) or prohibited (when efficacy has not been demonstrated in later lines)—rather than simply ‘allowed’ and thus subject to confounding.31 In this context, it is interesting to note that more trials in our study allowed cross-over rather than required or prohibited cross-over, suggesting there is considerable opportunity for improvement in this aspect of late-phase trial design. A recent study corroborates this notion, finding that PPTs in most phase III trials were substandard for the relevant clinical scenario.34 Finally, there are important ethical considerations for ensuring trial participants receive effective standard-of-care PPTs, particularly in low-income and middle-income countries with prominent barriers to access to care.35–38
While the overall rate of trials attempting to account for PPT confounding was low in this study, several factors were associated with an increased likelihood of PPT analysis. In particular, trials with cross-over were more likely to perform PPT analyses. Because cross-over is a component of trial design, we expected that trials with cross-over would be more likely to perform preplanned, rather than post hoc, PPT analyses. We did not find this correlation among the subset of cross-over trials, although this result may be partly attributable to the small number of cross-over trials in general, which increases the risk of type II error. Trials with a discrepancy between the surrogate survival findings and the OS findings were also more likely to attempt PPT confounding adjustments. This may be partly related to the increased attention given to PPTs in cross-over trials, as we also found that cross-over trials were more likely to be associated with discordant surrogate survival and OS results (which itself is unsurprising in the setting of superior experimental therapy). We also speculated that the correlation between discordant trials and PPT analyses may be related to trialists exploring explanations for their findings after observing discordance; however, trials with discrepant findings did not seem to be enriched for post hoc PPT analyses, although the interpretation of this regression is also limited by type II error risk. Ultimately, however, PPT adjustments should be considered for phase III trials evaluating OS where PPTs are expected at the time of trial design.
While simple censoring was the most common strategy to account for PPT effects, this approach may be particularly prone to selection bias and is generally not recommended.16 18 22 The key advantages and disadvantages of several selected statistical and trial design approaches are described in table 3. No approach has a universal advantage over the others, and the choice of adjustment should be considered based on each unique clinical trial scenario at the time of trial design. Because of the advantages and disadvantages of each statistical approach, sensitivity analyses incorporating multiple methods, with appropriate multiplicity of testing control, may be reasonable. For example, KEYNOTE-024 adjusted for cross-over effects with three models: the simplified two-stage method, rank-preserving structural failure time, and inverse probability of censoring weighting, each of which showed a larger effect size than the unadjusted analysis.39 BREAK-3 investigators showed that a rank-preserving structural failure time model and an iterative parameter estimation model strengthened the estimated OS treatment effect in a trial where more than half of patients in the control arm crossed over to the experimental therapy at progression.40 Bayesian approaches have also been suggested and may yield more efficient inferences compared with frequentist approaches.41 42 Ultimately, however, statistical approaches may not be able to adequately facilitate interpretation of trials with flawed designs and substantial PPT confounding.11 Sequentially multiple randomised assignment trials (SMART) randomise patients at the original treatment assignment and again at the time of progression to determine PPT allocation.43–45 This second randomisation to determine PPT allocation at progression removes systematic confounding from PPT selection, thus providing statistical license to compare dynamic treatment regime pairs by virtue of randomisation. Particularly in an era of increasing PPT effectiveness with targeted therapies and immunotherapies, trialists should consider SMART designs to control for systematic PPT confounding.
Table 3
|
Potential approaches towards addressing postprogression therapy (PPT) confounding of overall survival
There are several limitations to consider for this study. Trials were identified from the US-based registry ClinicalTrials.gov, so the findings may not be extrapolatable to global trials or trials conducted prior to the development of ClinicalTrials.gov in 2000. Data were abstracted by a single reviewer. While all manuscripts related to each trial as well as all online supplemental materials were evaluated for PPT analyses, trials that conducted but did not report these analyses due to presentation bias or other reasons may have led this study to underestimate the extent to which phase III trials have approached PPT analyses. Correlations with trial-level factors and PPT confounding analysis were not adjusted due to a small number of trials with PPT analysis and must be interpreted carefully.
In summary, a large-scale analysis of phase III oncology trials suggests that OS analyses accounting for the effects of PPTs are rare. To clarify the treatment effect estimates on OS, PPT effects should be considered from trial design through analysis, interpretation, and reporting of results. Readers, journal editors, and regulatory agencies should weigh the impacts of PPT confounding in the interpretation of the treatment effect on OS.