Discussion
We developed and validated a methylation-based risk score measured in pre-diagnostic blood DNA and compared its performance with that of an established traditional lung cancer risk model in study participants with a history of regular smoking exposure. We found that a methylation-based risk score with five CpG sites matched or slightly surpassed the PLCOm2012 model in discriminating between future lung cancer cases and controls. Combining the PLCOm2012 model and methylation markers did not further improve risk discrimination.
Screening high-risk individuals with a history of smoking exposure reduces lung cancer mortality.1 However, accurately identifying high-risk individuals as screening-eligible remains a challenge. The PLCOm2012 model predicts lung cancer risk better than the USPSTF20218 but uses self-reported smoking history, which may be influenced by recall bias and differences in cigarette smoking behaviour.26 Biomarkers, such as cotinine and certain DNA methylation sites/markers, may provide more objective measures of tobacco exposure. Cotinine is a marker of short-term smoking exposure27 whereas DNA methylation markers can inform on long-term smoking exposure.28
Environmental exposures can alter epigenetic patterns, and thereby stably influence gene expression, without changing the nucleotide sequence across these cell divisions, often resulting in changes in phenotype-persistent changes to molecular phenotypes.29 There are a series of published studies reporting extensive changes to DNA methylation associated with biological reflection/signature/imprint of smoking exposure to cigarette smoke, including a meta-analysis which identified differences in over 2600 CpG sites between smokers and never smokers.30 Smoking remains the most pronounced determinants of DNA methylation variation studied to date. Its impact is so marked that its effect is detected in epigenome-wide association studies of smoking-related outcomes, hence the observation that smoking-related changes predominate in EWAS of lung cancer.31 Because DNA methylation reflects biological smoking exposure, and its effect attenuates over time, it is a conceptually attractive candidate for risk stratification in both individuals who actively smoke and in individuals who have quit smoking.
Bojesen et al demonstrated that the AHRR (cg05575921) methylation alone performed similarly to the PLCOm2012 model in predicting lung cancer risk among participants who smoked. The current study confirms this finding32: using 514 case-control pairs, we developed a methylation-based risk score using five CpG sites that was validated in two external cohorts of 275 cases and 177 controls. We found that our methylation risk score alone slightly outperformed PLCOm2012 model in most relevant strata. Combining the methylation score with the PLCOm2012 model did not improve risk discrimination further. This suggests that the majority of lung cancer risk information contained among the selected CpG sites come from their ability to represent tobacco exposure history. Of the five CpG sites included in our methylation risk score, cg05575921 (AHRR) is the most well-established biomarker of smoking exposure.32–34 A study by Jacobsen et al suggested that integrating cg05575921 (AHRR) methylation with NLST screening criteria can improve the specificity of lung cancer screening by excluding those individuals with the lowest lung cancer risk from the eligible population.35
Given the wealth of additional informative smoking-associated methylation sites that have been reported, including those relevant to different ethnic groups,36 37 there is high potential to improve the five CpG site score defined in this study. Differential DNA methylation patterns have also been identified in studies of never smoking lung cancer cases.38 This observation raises the further possibility of extending a DNA methylation score beyond capturing smoking-related variation. A more comprehensive analysis of the use of much higher numbers of informative CpG sites on additional prospective cohort studies is warranted to enhance the discriminatory performance of a methylation score-based model.
One of the key strengths of our study is its prospective and population-based design, and most importantly the use of pre-diagnostic blood DNA. This study design minimises the possibility that the CpG sites studied are affected by the presence of an undetected developing tumour for most cases included in our study. Second, our approach involved training and testing of the methylation risk score in independent cohorts, a crucial and unique strength of our study. We also had a sufficient sample size to identify any meaningful differences in risk discrimination between standard and methylation-based risk scores. A potential limitation of our study is that both the training and validation cohorts were included in the original EWAS that identified the CpG sites taken forward for use in the prediction model. Whereas this may in theory result in some optimism in the risk discriminative performance of the methylation score in our validation sample,39 such bias is likely to be minimal because only the training cohorts were used to estimate the CpG site-specific parameters effect used in the methylation score. Another limitation of our study is the homogeneous nature of the included cohorts with predominantly white study participants. Future studies with diversity in race and ethnicity are therefore warranted to evaluate the transportability of methylation markers as lung cancer risk indicators. Importantly, because we used matched case-control studies, the AUC estimates do not reflect the performance that would have been seen in a random sample because the risk-discriminative performance afforded by age and sex (as well as smoking status in the training sample) has already been accounted for. Although this implies that the magnitude of the AUCs would differ in a random population sample, comparing the risk discriminative performance of different models is still valid using this design. We also note that our study design does not readily allow us to establish an absolute risk model, which is a pre-requisite for translation into a practical screening situation. Future studies should therefore be conducted using a design that facilitates the development of absolute risk models, such as case-cohort or full cohort analysis.
A major challenge for current screening programmes is that approximately half of incident lung cancer cases are not eligible for LDCT. It is well-established that risk stratification can improve the effectiveness of lung cancer screening programmes by identifying more future cases without screening more people, but few screening programmes have implemented individualised risk assessment prior to screening. Risk biomarkers have the potential to further improve risk assessment. In reflecting on the implications of our findings, and those of previous studies, it is not yet clear that a risk score based on smoking-associated CpG sites can provide important improvements in risk discrimination over and above that afforded by traditional questionnaire-based risk models. Rather, germline DNA methylation markers may be useful as a complementary means for risk assessment in situations where accurate smoking history is challenging to attain. Such molecular markers may provide patients and physicians with an objective measure of individualised risk for personalised decision making to reduce harm and improve benefits of screening. It is also possible that objective risk biomarkers—such as a methylation risk score—may circumvent the potential stigma associated with smoking in risk assessment, thereby motivating more individuals at risk to engage in lung cancer screening programmes. It will also be important to evaluate this hypothesis in a carefully designed study that evaluates acceptability of biomarker-based risk assessment in participants representative of the target population.