Cancer incidence and competing mortality risk following 15 presenting symptoms in primary care: a population-based cohort study using electronic healthcare records
•,,,,,,,,,,,,.
...
Abstract
·
Objectives Assessment of age, sex and smoking-specific risk of cancer diagnosis and non-cancer mortality following primary care consultation for 15 new-onset symptoms.
Methods and analysis Data on patients aged 30–99 in 2007–2017 were extracted from a UK primary care database (CPRD Gold), comprising a randomly selected reference group and a symptomatic cohort of patients presenting with one of 15 new onset symptoms (abdominal pain, abdominal bloating, rectal bleed, change in bowel habit, dyspepsia, dysphagia, dyspnoea, haemoptysis, haematuria, fatigue, night sweats, weight loss, jaundice, breast lump and post-menopausal bleed).
Time-to-event models were used to estimate outcome-specific hazards for site-specific cancer diagnosis and non-cancer mortality and to estimate cumulative incidence up to 12 months following index consultation.
Results Data included 1 622 419 patients, of whom 36 802 had a cancer diagnosis and 28 857 died without a cancer diagnosis within 12 months of the index.
The risk of specific cancers exceeded the UK urgent referral risk threshold of 3% from a relatively young age for patients with red flag symptoms. For non-organ-specific symptoms, the risk of cancer at individual sites either did not reach the threshold at any age or reached it only in older patients.
Conclusion Patients with new-onset symptoms in primary care often have comparable risks of cancer diagnosis and non-cancer mortality. Non-organ-specific symptoms, in particular, are associated with elevated risk of cancer at multiple different sites. Management of symptomatic patients in primary care should be informed by the risk of different cancer types alongside mortality risk.
What is already known on this topic
Evidence describing the diagnostic value of symptoms for cancer can help assess which patients who present to primary care need urgent specialist assessment.
Current evidence is limited as age is often handled categorically, smoking status is not taken into account and study periods are historical.
Further, evidence is concentrated on assessing the risk of specific cancer sites, although the same symptom can be related to cancer of different organs.
What this study adds
We present evidence on age-, sex- and smoking-status-specific estimates of the risk of cancer of different organs and overall, alongside estimates of non-cancer death.
Estimates relate to patients who present with one of 15 possible cancer symptoms from a relatively recent period.
Certain symptoms such as jaundice and dysphagia are associated with a high risk of non-cancer death in older patients.
Other symptoms, such as unintended weight loss, fatigue and abdominal pain, are associated with excess risk of a range of different cancers.
How this study might affect research, practice or policy
We provide detailed evidence and results that may help frame future research studies into the risk of cancer in symptomatic patients and update and refine policy on referral and diagnostic investigation of patients in primary care.
Introduction
Most patients with cancer are diagnosed after symptomatic presentation,1 and, given the paucity of effective tests to enable population-based cancer screening, this is likely to be the case for the coming decade. Appropriately suspecting the diagnosis of cancer in symptomatic patients is difficult, as symptoms may be caused by many other diseases. Even so-termed ‘alarm’ or ‘red-flag’ symptoms typically have positive predictive values for cancer that do not exceed 5% in women of any age or in men younger than 70.2 In the UK, many patients with cancer experience diagnostic delays in the form of multiple pre-referral consultations and prolonged intervals to diagnosis, despite practice guidelines issued by the National Institute for Health and Social Care Excellence (NICE) that aimed to enable prompt diagnosis of cancer in primary care.3 4 Such delays are associated with adverse patient experience and worse clinical outcomes.5–8
Currently, the majority of research publications supporting practice guidelines come from case-control studies, examining symptom-related risk of specific cancer sites. This study design ignores that presenting symptoms are often shared between different cancers and diseases other than cancer; there has been no comprehensive examination of the risk of the full spectrum of possible cancer types for the most relevant presenting symptoms. Further, guideline recommendations handle major cancer risk factors sub-optimally, as smoking status is typically ignored as a risk stratifier, and age is typically not considered as a continuous variable, leading to information loss. Competing risk of death is also ignored, meaning that management decisions centred on cancer risk ignore risks related to other diseases.
This study is motivated by the need for evidence to support the updating of clinical practice guidelines for the primary care management of patients who present with symptoms of possible underlying cancer. Such evidence is needed both in terms of quantifying the absolute risk of different cancer types and also the probability of patients dying without a cancer diagnosis. We also aim to aid the development of and complement the use of risk prediction tools by describing in detail the associations between symptoms and cancer risk. We therefore assess age-, sex- and smoking-specific risk of cancer diagnosis and non-cancer mortality following primary care consultation for one of 15 new-onset symptoms.
Methods
Study population
We used a cohort study design, based on medical records from English National Health Service general practices that contributed anonymised primary-care electronic health records to the Clinical Practice Research Datalink Gold (CPRD). CPRD covers approximately 6.9% of the UK population,9 and patients in CPRD are broadly representative of the UK general population with respect to age, sex and ethnicity.9 CPRD was linked to cancer diagnosis information from the English national cancer registry.10 We considered all cancers excluding non-melanoma skin cancer, as non-melanoma skin cancer is imperfectly registered and primarily managed in primary care.
A study flowchart is given in online supplemental appendix 1 figure 1. We first extracted a random sample of patients from CPRD for use as a reference group, choosing index dates randomly from ‘valid’ follow-ups from 1 January 2007 to 31 December 2017. Patients in this reference group were not necessarily symptom-free (online supplemental appendix 1 table 1). Coded symptom data are known to be incomplete,11 12 so it was not possible to create a truly symptom-free control group with the data available for this study. Thus, we chose to use a reference group that would represent the average risk for patients registered in primary care. We then created a symptomatic cohort of all patients in CPRD Gold who had consulted for any of 15 presenting symptoms and who were not in the reference group, choosing the index date as the date of their first ‘valid’ consultation for a symptom during 1 January 2007 to 31 December 2017.
For an individual patient, follow-up was judged to be ‘valid’ if they had been registered at their practice for at least 1 year; their practice was judged by CPRD to be providing data of a suitable standard for use in research (ie, after the practice’s ‘up-to-standard’ date); it was before the last data transfer to CPRD (ie, the ‘last collection’ date); the patient was registered at a CPRD practice (ie, before the patient’s ‘transfer out’ date and before their death); the patient was aged 30–99; and the patient had not yet had a recorded cancer diagnosis in the cancer registry (excluding non-melanoma skin cancer).
Outcomes
Both mortality and cancer diagnoses were considered. Mortality was identified from the primary care record; such information is highly concordant with the ‘gold standard’ official death registration records and is correct within 1 month 98% of the time.13 Cancers were split into seven groups for men and eight groups for women, summarised below and with a full ICD10 codelist in online supplemental appendix 1 table 2, guided by underlying body systems and corresponding major clinical specialties receiving urgent referrals for suspected cancer in England.14 Cancer diagnoses were sourced from linkages with the national cancer registry, and only the first cancer diagnosis was considered (excluding non-melanoma skin cancer); available cancer data covered diagnoses up until 31 December 2018.
The cancer groups considered were:
Breast cancer (women only), including invasive breast and in-situ breast cancers.
Gynaecological cancer (women only), including invasive cervical, in-situ cervical, ovarian, uterine and vulvar cancers.
Lung, including lung cancer and mesothelioma.
Upper gastrointestinal (GI), including liver, oesophageal, pancreatic and stomach cancers.
Lower GI, including colon and rectal cancers.
Urological, including bladder, in-situ bladder, kidney and other urinary tract cancers.
Prostate cancer (men only).
Haematological, including Hodgkin’s lymphoma, non-Hodgkin’s lymphoma, acute myeloid leukaemia, chronic lymphocytic leukaemia, other leukaemias, myeloma and other haematological cancers
Other, including all other sites, specifically melanoma, unknown primary, thyroid and meningeal cancers, also including testicular cancer and male breast cancer.
The first outcome (of cancer diagnosis or non-cancer death) experienced by each patient was considered in the analysis. This means, for example, that in the analyses of cumulative incidence, a patient who died shortly following a cancer diagnosis would only be considered to have had a cancer diagnosis, and their death would not contribute to the estimation of mortality risk, irrespective of the cause of death. Patients with a cancer diagnosis on the same day as their death (including death certificate-only registrations of cancer) were treated as having had a cancer diagnosis rather than having died, noting that death certificate-only registrations remained at <0.4% through the study period.15
Symptoms
Patients were selected due to a primary care presentation with one (or more) of 15 cancer-relevant symptoms or due to being in the reference group. The index date of symptomatic patients was the date of their first recorded symptom during ‘valid’ follow-up (defined in Methods—study population).
The symptoms we considered were a subset of those known to be associated with the risk of specific types of cancer and are already included in referral guidelines for symptomatic cancer.3 16 The included symptoms form part of the presentation in 40% of all patients with cancer in England.1 We identified symptoms from coded primary care data using existing Read v2 phenotyping algorithms.16 The symptoms we considered were:
Only the first presenting symptom for each patient was included, and each patient was included at most once in the analysis. For example, if a patient had a consultation for a breast lump in 2007 that did not result in a cancer diagnosis and a consultation for abdominal pain in 2010 that did result in a cancer diagnosis, only the risk after the 2007 consultation for a breast lump would be included in analysis. Symptoms were included in the model using one-hot encoding, with patients in the reference group having all symptom variables set to 0. Where patients in a symptomatic cohort presented with two or more symptoms on their index date, all were included as index symptoms (such occurrences were rare, see end of the Results section). Symptoms that were not consulted for on the same day as the index were not considered.
Smoking status, sex and age
Patients were categorised as ever-smokers or never-smokers. Ever-smokers included all patients with a record of being current or ex-smokers in their entire primary care record, including periods after a cancer diagnosis or before their record became eligible for use in this study; never-smokers included all other patients. Patients were grouped as male or female based on the recorded gender in their primary care record. Patients’ age was estimated as the number of years between the mid-point of their year of birth and their index date.
Statistical methods
The initial analysis described the distribution of patients in the sample and counts of cancer diagnoses and deaths within 12 months of any index symptom.
Hazards for specific cancers and non-cancer mortality were estimated using flexible parametric (Royston–Parmar) time-to-event models,17 using three degrees of freedom to model the baseline hazard. Follow-up for these analyses was censored at 18 months after the index symptom, at the first event (ie, cancer diagnosis or death) or the end of the available cancer registry follow-up on 31 December 2018 if earlier. Models were stratified by sex and included the following covariates:
Age (restricted cubic spline with six knots).
Smoking status (binary, ever record of smoking in primary care data vs never).
Index symptom (15 binary variables indicating the symptom(s) each patient had on their index date (all zero for patients in the reference group)).
An interaction with (the log of) follow-up time in months for each index symptom, allowing the association between symptom and cause-specific risk to decay over time. This was motivated by the fact that following many possible symptoms of cancer, excess risk is highest in the first months following presentation (eg,18).
The cumulative incidence of cancer group and non-cancer mortality was estimated by combining each of the cause-specific models into a multistate model using the latent failure time approach.18 We report cumulative incidence for combinations of age-, sex- and smoking-specific symptoms up to 12 months follow-up, with results focusing on estimated cumulative incidence at 12 months and age considered in 5 year intervals. To sense-check these model-based estimates, we additionally examined the crude cumulative incidence for each cancer group and non-cancer mortality within 12 months of each symptom by sex and smoking status using Aalen–Johansen non-parametric cumulative incidence curves.19 20
Concordant with the methods and evidence that informed the development of NICE guidelines, we have considered the modelled cumulative incidence at 12 months to represent the positive predictive value for the outcome for the symptom.3 Further, we calculated the (sex-/smoking-/symptom-specific) age at which the cancer risk exceeded the 3% risk threshold for referrals used in the UK. We additionally present similar estimates for each individual cancer group.
Statistical modelling used Stata 17 MP. Simulation of failure times was performed on a high-performance cluster using Stata 16 MP. Survival models were fit using the merlin package,21 and multistate modelling was facilitated by the multistate package.22 In principle, the cancer risk for any combination of symptoms can be estimated from the cause-specific models, but these have not been produced due to computational limitations and the very large number of potential combinations. Data extraction and analysis code are available at https://github.com/MattEBarclay/cprd_symptom_cancer_1.
Patient and public involvement
The study forms part of a programme of work examining the predictive value of symptoms for cancer diagnosis using electronic health records data. To support this programme, we ran three focus groups in August and September 2023 including a total of 15 patient and public involvement volunteers. Study reporting was informed by input from these volunteers, but no specific changes were made.
Results
The analysis cohort included 1 622 419 patients, 835 995 with an eligible first symptom recorded between 2007 and 2017 (table 1; see online supplemental appendix 2 tables 1–16 for the demographics of the reference group and each symptomatic sub-cohort). More than half of the cohort (64%, 1 040 862) were aged under 60 at the index (69% of the reference group vs 60% of those with symptoms, online supplemental appendix 2 table 1, with 24 731 (1.5%) patients aged 90 or older. The distribution of symptoms was uneven, with 14.6% of the cohort having abdominal pain as the index symptom, followed by fatigue (8.9%), dyspnoea (8.8%), dyspepsia (6.8%), rectal bleeding (3.0%), breast lump (2.4%), haematuria (1.6%), abdominal bloating (1.4%), weight loss (1.2%), change in bowel habit (1.1%), dysphagia (0.9%), post-menopausal bleeding (0.5%), night sweats (0.5%), haemoptysis (0.4%) and jaundice (0.1%). The majority of patients (64%) had at least one smoking-related read code in their records and were identified as ever-smokers; recorded smoking was slightly less common in the reference group (60%, online supplemental appendix 2 table 1). Within 12 months of their first recorded symptom, 36 802 patients had a cancer diagnosis and 28 867 patients died without a cancer diagnosis (a further 9288 died following a cancer diagnosis); both cancer and mortality risk were higher in older patients. Ever-smokers had a slightly higher cancer risk than patients without any smoking-related codes (table 1).
Table 1
|
Cohort summary
Age-adjusted cancer-specific HRs for smoking and each index symptom
HRs for each cancer site and for non-cancer death at 1 month after index, for men (left) and women (right). Ever-smokers are compared with never-smokers; each symptom is compared with the control group. Models are stratified by sex and adjusted for age, smoking status and the presence of symptoms at the index date.
Patients consulting for symptoms of possible cancer had similar or greater cause-specific hazards for almost every cancer site than the reference population (figure 1 and online supplemental appendix 3). Yet for 10 of the 15 studied symptoms, the symptom was associated with lower cause-specific hazards for death than the reference group (the exceptions being dysphagia, jaundice, dyspnoea, haemoptysis and weight loss).
Further, for many symptoms associated with a very high initial hazard of a specific cancer, while the hazard typically remained elevated at least 12 months after the index consultation, it tended to reduce over time. For example, online supplemental appendix 3 table 1 shows HRs for lung cancer. The HR for haemoptysis is 17.1, but there is a statistically significant interaction with (the natural log of) follow-up time in months with an HR of 0.7; by 12 months the HR for lung cancer is estimated to have decreased to around 7.3. This fits with the non-parametric results shown in the Aalen–Johansen plots, where after haemoptysis presentation diagnoses of lung cancer rapidly accrue until about 3 months follow-up, after which they continue growing but less rapidly (online supplemental appendix 2 figures 1–4).
For both men and women presentations with abdominal symptoms were associated with increased hazard of multiple types of cancer, particularly lower GI cancer (online supplemental appendix 3 tables 3 and 13) and upper GI cancer (online supplemental appendix 3 tables 2 and 12). At the same time, abdominal symptoms were associated with decreased hazard of death without a cancer diagnosis when compared with the reference group, except for dysphagia and jaundice (figure 1 and online supplemental appendix 3 tables 8 and 17). Cause-specific HRs at 1 month after the presentation were highest regarding lower GI cancer for rectal bleeding (eg, for men: HR 17.4, 95% CI 15.7 to 19.4) and change in bowel habit (eg, for men: HR 21.5, 95% CI 19.0 to 24.3) and highest regarding upper GI cancer for jaundice (eg, for women: HR 122, 95% CI 102 to 147) and dysphagia (eg, for women: HR 16.4, 95% CI 14.0 to 19.2); HRs decreased substantially over follow-up for these symptoms. Abdominal pain and abdominal bloating were associated with HRs at the consultation of around four for both upper and lower GI cancers (eg, abdominal bloating in women with HR for lower GI cancer of 3.0, 95% CI 2.3 to 4.0), with abdominal bloating having a similar association for gynaecological cancers in women (HR 4.8, 95% CI 4.0 to 5.6, online supplemental appendix 3 table 10), while dyspepsia was associated with an HR of around four for upper GI cancer. Patients with abdominal symptoms also appeared at elevated risk for urological and haematological cancers and for prostate and gynaecological cancers.
Respiratory symptoms (dyspnoea, haemoptysis)
Respiratory symptoms were primarily associated with lung cancer, but the strength of the association varied (figure 1 and online supplemental appendix 3 tables 1 and 11). Patients with haemoptysis had a cause-specific HR of around 16 at consultation compared with the reference group (eg, for men, HR 17.1, 95% CI 14.8 to 19.8), while the association with dyspnoea was weaker but still notable (eg, for men, HR 2.6, 95% CI 2.4 to 2.9). Other types of cancer, notably haematological cancers, also had elevated cause-specific hazards after presentation with haemoptysis (eg, for men, the HR for haematological cancer being 2.8, 95% CI 1.7 to 4.6, online supplemental appendix 3 table 6) or dyspnoea (HR 1.7, 95% CI 1.5 to 1.8).
Non-specific symptoms (fatigue, night sweats, weight loss)
Non-specific symptoms were typically associated with elevated cause-specific HRs for all cancer groups considered (figure 1 and online supplemental appendix 3) and generally HRs appeared relatively similar in strength for each of the three non-specific symptoms. Weight loss had the strongest associations overall (cancer-specific HRs general between 2 and 5), followed by night sweats (HRs generally between 1 and 4, though imprecisely estimated), followed by fatigue (HRs between 1 and 2). It often appeared that the strongest cause-specific associations were for haematological cancers, though CIs tended to overlap with those of other cancer groups.
Breast and reproductive organ symptoms (breast lump, post-menopausal bleeding)
Post-menopausal bleeding was associated with large cause-specific HRs for gynaecological cancer (HR 43, 95% CI 39 to 47) and substantial cause-specific HRs for urological cancer (HR 4.1, 95% CI 2.6 to 6.4) (figure 1 and online supplemental appendix 3 tables 10 and 14). Breast lump in women was associated principally with breast cancer (HR 65, 95% CI 61 to 69) and to a lesser extent with haematological cancer (HR 2.6, 95% CI 1.80 to 3.6) (online supplemental appendix 3 tables 9 and 15). A small number of men present with breast lump, and these men had cause-specific HRs for the ‘other cancer’ group, which included male breast cancer, of 7.1 (95% CI 5.0 to 10.0) (online supplemental appendix 3 table 7).
Risk of specific cancer sites by age, sex and smoking status
After symptom presentation for patients with single index symptoms and based on simulations combining the cause-specific models, we present simulated cumulative incidence of each cancer site and of death without cancer at 3 months (online supplemental appendix 4 figures 1–4), 6 months (, online supplemental appendix 4 figures 5–8) and 12 months (figures 2–5, online supplemental appendix 5). Hereafter in this section, we discuss cumulative incidence at 12 months after symptom consultation. Unlike the HRs presented above, estimates of cumulative incidence varied substantially by sex, as women have lower baseline cancer risk.
Modelled cancer and mortality risk at 12 months by index symptom, female smokers.
3% any cancer risk thresholds at 12 months
Patients reaching a 3% risk of any cancer may not reach such a risk level for any specific cancer group, especially for symptoms associated with multiple types of cancer. For example, female smokers presenting with weight loss had a 3% risk of cancer from age 60, but did not reach the 3% risk threshold at any age when any of the individual cancer groups were considered on their own (table 2). For male non-smokers, the risk of any cancer reached the 3% threshold from the following ages and onwards: 45 for jaundice; 55 for dysphagia, weight loss, haematuria and change in bowel habit; 60 for haemoptysis and rectal bleeding; 65 for abdominal pain and bloating, night sweats and breast lump; and 70 for dyspepsia, dyspnoea and fatigue. For smokers, this threshold was often reached up to 5 years younger. Conversely, compared with male patients presenting with the same symptom, female patients reached the 3% threshold at an older age on average, with the main exception being breast lump for which the 3% threshold (in women) was reached from age 40.
Table 2
|
Modelled age at which patients presenting with each symptom had a 3% risk (ie, high enough to trigger urgent referral for suspected cancer in England) of all cancers combined and of specific cancer sites, by smoking status and sex.
Notably, male smokers in the reference group had a 3% risk of any cancer from age 75, and male non-smokers from age 90; women in the reference group did not reach a 3% risk of cancer at any age.
A summary of risk of individual cancers is given in online supplemental appendix 6, plus additional graphical and tabular results in Appendices 4 and 5.
Risk of non-cancer mortality
For most of the studied symptoms, symptomatic patients were less likely to die (without a cancer diagnosis) than similar patients in the reference group (figures 2–5). The three principal exceptions were jaundice, dysphagia and weight loss, for which post-presentation mortality exceeded that in the reference group, and also older patients with less-specific symptoms for whom the risk of non-cancer mortality was often higher than the risk of any cancer. For example, for male smokers presenting with dyspnoea, around 6% who presented at age 80 would develop cancer within 12 months, while 9% would die (figure 3, online supplemental appendix 5 table 1).
Presentation with multiple symptoms
Among symptomatic patients, 1.2% (10 360 of 835 995) consulted for more than one of the 15 studied symptoms on their index date, and a further 2.5% (21 167) consulted for an additional studied symptom within 30 days of an index symptom but before a cancer diagnosis (table 3). The proportion of patients with multiple index symptoms subsequently diagnosed with cancer within 12 months of the index (4.6%, 95% CI 4.2% to 5.1%) was higher than for patients with a single index symptom (3.5%, 95% CI 3.5% to 3.5%). This higher risk of cancer in patients with multiple index symptoms appeared applicable to many of the symptoms considered, but sample size limitations meant proportions developing cancer could often not be estimated precisely.
Table 3
|
Summary of cancer outcomes for patients with multiple different recorded symptoms at index presentation and within 30 days of index symptom.
Discussion
Using a cohort design, we comprehensively estimated the risk of different cancer diagnoses and non-cancer mortality following presentation in primary care with one of 15 index symptoms and in a reference group that was not selected based on symptom status and so should approximate the risk in the general population. There was considerable variation in the risk by age and by sex. Smoking status was highly informative for cancer risk of patients with respiratory or non-organ-specific symptoms. Smokers typically reached the 3% threshold warranting referral for cancer investigations up to 5 years younger than non-smokers. The findings highlight the importance of including smoking status in clinical guidelines and referral decisions in patients with a new-onset symptom. Even symptoms with strong, well-established associations with specific cancer often have notable associations with other types of cancer. One example is dyspnoea, which is typically considered a symptom of lung cancer, but we find is also associated with an HR for haematological cancers of around 1.7 in both men and women. We also provide estimates of cancer risk while considering the potential for non-cancer mortality. For the oldest patients—and those with symptoms such as dysphagia or jaundice—the risk of death without a cancer diagnosis reached or exceeded the risk of cancer.
Strengths and weaknesses
Key strengths of the study are (a) the large representative dataset, allowing examination of a range of both common and rare symptoms and outcomes; (b) the joint estimation of the risks of different outcomes, including non-cancer mortality and risk of different types of cancer; and (c) the use of cancer registry data to ascertain the presence of cancer, as cancer may be under- or over-recorded in non-registry sources.23 While this study represents the most comprehensive and detailed description of the risk of cancer in symptomatic patients to date, there are various areas where future work could make further improvements.
Our study covers a period from 2007 to 2017, during which there have been many secular changes such as the introduction of public health education campaigns to raise awareness of symptoms of possible cancer among members of the public, alongside changes in clinical guidelines for referral for suspected cancer, and how NHS diagnostic services are configured, but we have not examined secular trends in estimated risks. Further, the study only considers deaths in patients without cancer, but it may be important to understand if patients die quickly after a cancer diagnosis. Our measure of the smoking status does not allow for a refined appreciation of smoking history and dose-response relationships. Additionally, our analytical approach only allowed each patient to be included once, not making full use of the longitudinal nature of EHR datasets.24 We did not consider interactions between symptoms and simulated outcomes for patients with a single symptom only, in part due to only a few patients having multiple symptoms. We did not have access to free-text data, despite evidence that coded data does not capture all symptoms.11 12 Finally, we only examined 15 symptoms, ignoring the many other symptoms and important health conditions that may be associated with the risk of cancer.1 17 25 A more detailed examination of potential limitations is given in online supplemental appendix 7.
Comparison with literature
A large and growing literature describes the risk of cancer following symptom presentations in primary care; Moore and colleagues summarised the literature pre-2020,17 and there are several recent papers.26–29 Existing literature (a) rarely considers competing non-cancer mortality risk, (b) rarely considers smoking status and (c) frequently provides no or only limited information on the age-dependent and sex-specific nature of the risk of different cancers. Much of the previous evidence additionally considers either the risk of all cancers combined or focuses on specific cancer sites judged to be of relevance to the specific examined symptoms a priori. We improve on previous descriptive studies by presenting a broad range of possible cancer diagnoses following presentation with a wider spectrum of index symptoms. Further research is needed to extend analyses similar to those reported here to a wider collection of symptoms.
Some existing evidence on the so-called red-flag symptoms such as rectal bleeding and haemoptysis suggests the risk of cancer exceeds 3% for all ages but did not examine the risk in different age groups17; our findings indicate that the risk of cancer following these symptoms only exceeds 3% beyond certain age cut-offs. Furthermore, we show that for non-specific symptoms, the risk of any cancer exceeds 3% at a considerably earlier age than the risk of a specific cancer type, underscoring the need for studies that comprehensively examine all major cancer types. Weight loss provides a cardinal example, where the risk of any cancer exceeded 3% in male non-smokers from age 55, but the risk of any individual site only reached 3% at age 85.
Other studies have aimed to develop risk prediction tools for cancer intended for use in a primary care setting (see, eg,30–32), and in particular the QCancer risk prediction tool33 34 already considers a range of symptoms and a risk of a diagnosis of different types of cancer. For decisions about the management of an individual patient, a risk prediction tool including multiple potential predictors may be more suitable than the results presented in this paper. We view our results as complementary; by describing what is effectively the average risk in patients presenting with these symptoms (by age, sex and smoking status), we can inform high-level policy decisions around the symptomatic diagnosis of cancer such as clinical guideline recommendations and help developers of more detailed risk prediction models by highlighting symptoms they may wish to consider. Further, our consideration of mortality risk provides relevant information that is frequently missing from current risk prediction tools (including QCancer) and that is especially important in frail and elderly populations.
Implications
Symptoms recorded in primary care data can be highly informative about both cancer risk and short-term mortality risk. In some cases, for example, lung cancer, smoking status is very strongly associated with the risk of cancer following a certain symptom. The risk of cancer and non-cancer mortality varies considerably by age; describing the ‘overall’ risk of cancer following a symptom may be misleading if non-cancer mortality is not considered. Some (non-cancer) deaths will relate to as-yet undiagnosed diseases which, like a cancer diagnosis, necessitates specialist assessment in secondary care, though this should be the subject of future inquiries.
For researchers, our results underline the methodological importance of accounting for the fact that symptoms may be associated with multiple different disease outcomes. Advanced statistical modelling strategies are helpful in assessing diagnostic outcomes using EHR data, and current statistical packages allow for relatively straightforward handling of competing risks either by directly modelling cumulative incidence (eg, the Fine-Grey model35) or, as here, by combining several cause-specific models.36 Diagnostic research should adopt strategies that allow consideration of the risk of several potentially related diseases (eg, multiple types of cancer, as in this study), which can be done even with simple analytical approaches such as the appropriate use of logistic regression.29
For clinicians and policymakers, our systematic assessment of the risk of cancer (and of non-cancer mortality) in symptomatic patients in primary care raises two key questions.
First, whether all age-sex-smoking status groups presenting with each of the studied symptoms and with an estimated any-cancer risk of above 3% should explicitly be added to NICE referral guidelines. This may indeed be justified, though given the high mortality rates in the oldest patients, there might also be a risk of over-testing in older men in particular. However, the degree to which the risk of over-testing is a concern relates to the exact causes of non-cancer mortality and the extent to which it relates to pre-diagnosed or new non-neoplastic diseases which could benefit from specialist diagnostic assessment and earlier diagnosis. As the components of non-cancer mortality due to pre-existing or new conditions are unclear, this should be addressed in future research. The current approach to cancer referral uses a normative threshold applicable to patients of any age and with any symptoms, and the results highlight the importance of considering whether patients are likely to benefit from prompt diagnosis.
Second, whether current referral pathways are necessarily ideal. For example, many abdominal symptoms were strongly associated with lower GI, upper GI and gynaecological cancers, and some form of referral pathway offering combined multi-specialty assessment may be justified for patients with these symptoms. Further, symptoms were often strongly associated with less common cancers such as haematological neoplasms; however, due to the low incidence of these conditions, the absolute risk rarely or never reached 3%; optimal diagnostic management of these patients is clearly challenging. Our findings may be helpful in clarifying referral criteria for new non-specific cancer pathways.
Conclusions
The risk of cancer diagnosis and non-cancer mortality after symptomatic presentation can be comparable and both should be considered in referral and investigation decisions—alongside age, sex and smoking status. A holistic and stratified assessment of the risk in symptomatic patients, which considers the risk of a cancer diagnosis, the risk of a diagnosis of individual types of cancer and the risk of non-cancer mortality, is needed particularly for patients presenting with which are vague or non-specific symptoms associated with multiple cancer types and appreciable non-cancer mortality risk. Our results can support the updating of referral and management guidelines for symptomatic patients presenting in primary care.
Contributors: MB designed the statistical analysis, wrote analytical code, cleaned and analysed the data and drafted and revised the paper. He is the guarantor. CR, HH and GL contributed to drafting the paper. CR, JU-S, NP and GL provided clinical interpretation. HH, AT, BW, SI and SD contributed to data management and phenotyping. JL, AW and ACA contributed to the design and interpretation of the analysis. All authors provided revisions to the paper and gave final approval to the submitted manuscript.
Funding: The work was supported by the International Alliance for Cancer Early Detection, a partnership between Cancer Research UK (C18081/A31373), Canary Center at Stanford University, the University of Cambridge, OHSU Knight Cancer Institute, University College London and the University of Manchester. SI is additionally supported by Cancer Research UK (EDDPMA-May22\100062) and HH and MB by CRUK International Alliance for Cancer Early Detection (ACED) Pathway Awards (EDDAPA-2022/100001 and EDDAPA2022/100002, respectively). GL was supported by a Cancer Research UK (C18081/A18180) Advanced Clinician Scientist Fellowship. CR acknowledges funding from Cancer Research UK Early Detection and Diagnosis Committee (grant number EDDCPJT\100018). JUS is supported by a National Institute of Health Research Advanced Fellowship (NIHR300861). ACA is supported by Cancer Research UK grant: PPRPGM-Nov20\100002. SI, AW and ACA are supported by the National Institute for Health and Care Research (NIHR) Cambridge Biomedical Research Centre (NIHR203312) [*]. AW is part of the BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement No 116074. AW and SI are supported by the British Heart Foundation (RG/18/13/33946: RG/F/23/110103) and by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. All authors had access to statistical reports, tables and analysis code. MB, CR, BW, SI and GL had full access to all of the data.
Disclaimer: The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.
Competing interests: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/disclosure-of-interest/ and declare no support from any organisation for the submitted work; MB has received personal fees from Grail Inc for membership of an Independent Data Monitoring Committee; no other relationships or activities that could appear to have influenced the submitted work.
Patient and public involvement: Patients and/or the public were involved in the design, conduct, reporting or dissemination plans of this research. Refer to the Methods section for further details.
Provenance and peer review: Not commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Data availability statement
Data may be obtained from a third party and are not publicly available. CPRD Gold data can be obtained from CPRD, subject to protocol approval viaCPRD’s Research Data Governance Process. Further details can be found at https://cprd.com/data-access. The analytical code used in this study is available at https://github.com/MattEBarclay/cprd_symptom_cancer_1.
Ethics statements
Patient consent for publication:
Not applicable.
Ethics approval:
This study involves human participants and was approved by the UK Medicines and Healthcare products Regulatory Agency Independent Scientific Advisory Committee (ISAC Protocol number 18_299), under Section 251 (NHS Social Care Act 2006). This study used routinely collected healthcare data, with access approved under Section 251 of the NHS Social Care Act 2006. Seeking active consent from participants would not have been practical.
Zakkak N, Barclay ME, Swann R, et al. The presenting symptom signatures of incident cancer: evidence from the English 2018 National Cancer Diagnosis Audit. Br J Cancer2024; 130:297–307. doi:10.1038/s41416-023-02507-4•Google Scholar
Jones R, Latinovic R, Charlton J, et al. Alarm symptoms in early diagnosis of cancer in primary care: cohort study using General Practice Research Database. BMJ2007; 334:1040. doi:10.1136/bmj.39171.637106.AE•Google Scholar
National Institute for Health and Care Excellence. Suspected Cancer: Recognition and Referral. NICE guidelines2015; Google Scholar
Hamilton W, Hajioff S, Graham J, et al. Suspected cancer (part 2--adults): reference tables from updated NICE guidance. BMJ2015; 350. doi:10.1136/bmj.h3044•Google Scholar
Neal RD, Tharmanathan P, France B, et al. Is increased time to diagnosis and treatment in symptomatic cancer associated with poorer outcomes? Systematic review. Br J Cancer2015; 112 Suppl 1:S92–107. doi:10.1038/bjc.2015.48•Google Scholar
Lyratzopoulos G, Neal RD, Barbiere JM, et al. Variation in number of general practitioner consultations before hospital referral for cancer: findings from the 2010 National Cancer Patient Experience Survey in England. Lancet Oncol2012; 13:353–65. doi:10.1016/S1470-2045(12)70041-4•Google Scholar
Mendonca SC, Abel GA, Saunders CL, et al. Pre-referral general practitioner consultations and subsequent experience of cancer care: evidence from the English Cancer Patient Experience Survey. Eur J Cancer Care (Engl)2016; 25:478–90. doi:10.1111/ecc.12353•Google Scholar
Swann R, McPhail S, Witt J, et al. Diagnosing cancer in primary care: results from the National Cancer Diagnosis Audit. Br J Gen Pract2018; 68:e63–72. doi:10.3399/bjgp17X694169•Google Scholar
Herrett E, Gallagher AM, Bhaskaran K, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol2015; 44:827–36. doi:10.1093/ije/dyv098•Google Scholar
Henson KE, Elliss-Brookes L, Coupland VH, et al. Data Resource Profile: National Cancer Registration Dataset in England. Int J Epidemiol2020; 49:16–16h. doi:10.1093/ije/dyz076•Google Scholar
Kostopoulou O, Tracey C, Delaney BC, et al. Can decision support combat incompleteness and bias in routine primary care data? J Am Med Inform Assoc2021; 28:1461–7. doi:10.1093/jamia/ocab025•Google Scholar
Price SJ, Stapley SA, Shephard E, et al. Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case-control study. BMJ Open2016; 6. doi:10.1136/bmjopen-2016-011664•Google Scholar
Harshfield A, Abel GA, Barclay S, et al. Do GPs accurately record date of death? A UK observational analysis. BMJ Support Palliat Care2020; 10. doi:10.1136/bmjspcare-2018-001514•Google Scholar
NHS England. NHS Data Model and Dictionary. Two Week Wait Cancer or Symptomatic Breast Referral Type. 2023;
NHS England. Routes to Diagnosis 2018. 2022; Google Scholar
Moore SF, Price SJ, Chowienczyk S, et al. The impact of changing risk thresholds on the number of people in England eligible for urgent investigation for possible cancer: an observational cross-sectional study. Br J Cancer2021; 125:1593–7. doi:10.1038/s41416-021-01541-4•Google Scholar
Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med2002; 21:2175–97. doi:10.1002/sim.1203•Google Scholar
Prentice RL, Kalbfleisch JD, Peterson AV, et al. The analysis of failure times in the presence of competing risks. Biometrics1978; 34:541–54. doi:10.2307/2530374•Google Scholar
Aalen O, Johansen S. An Empirical Transition Matrix for Non-Homogeneous Markov Chains Based on Censored Observations. Scand J Stat1978; 5:141–50. Google Scholar
Wolkewitz M, Cooper BS, Bonten MJM, et al. Interpreting and comparing risks in the presence of competing events. BMJ2014; 349. doi:10.1136/bmj.g5060•Google Scholar
Crowther MJ. merlin—A unified modeling framework for data analysis and methods development in Stata. The Stata Journal: Promoting communications on statistics and Stata2020; 20:763–84. doi:10.1177/1536867X20976311•Google Scholar
Crowther MJ, Lambert PC. Parametric multistate survival models: Flexible modelling allowing transition-specific distributions with application to estimating clinically useful measures of effect differences. Stat Med2017; 36:4719–42. doi:10.1002/sim.7448•Google Scholar
Arhi CS, Bottle A, Burns EM, et al. Comparison of cancer diagnosis recording between the Clinical Practice Research Datalink, Cancer Registry and Hospital Episodes Statistics. Cancer Epidemiol2018; 57:148–57. doi:10.1016/j.canep.2018.08.009•Google Scholar
Keogh RH, Seaman SR, Barrett JK, et al. Dynamic Prediction of Survival in Cystic Fibrosis: A Landmarking Analysis Using UK Patient Registry Data. Epidemiology (Sunnyvale)2019; 30:29–37. doi:10.1097/EDE.0000000000000920•Google Scholar
White B, Rafiq M, Gonzalez-Izquierdo A, et al. Risk of cancer following primary care presentation with fatigue: a population-based cohort study of a quarter of a million patients. Br J Cancer2022; 126:1627–36. doi:10.1038/s41416-022-01733-6•Google Scholar
White B, Renzi C, Barclay M, et al. Underlying cancer risk among patients with fatigue and other vague symptoms: a population-based cohort study in primary care. Br J Gen Pract2023; 73:e75–87. doi:10.3399/BJGP.2022.0371•Google Scholar
Herbert A, Rafiq M, Pham TM, et al. Predictive values for different cancers and inflammatory bowel disease of 6 common abdominal symptoms among more than 1.9 million primary care patients in the UK: A cohort study. PLoS Med2021; 18. doi:10.1371/journal.pmed.1003708•Google Scholar
Price SJ, Gibson N, Hamilton WT, et al. Intra-abdominal cancer risk with abdominal pain: a prospective cohort primary care study. Br J Gen Pract2022; 72:e361–8. doi:10.3399/BJGP.2021.0552•Google Scholar
Williams TGS, Cubiella J, Griffin SJ, et al. Risk prediction models for colorectal cancer in people with symptoms: a systematic review. BMC Gastroenterol2016; 16. doi:10.1186/s12876-016-0475-7•Google Scholar
Harrison H, Usher-Smith JA, Li L, et al. Risk prediction models for symptomatic patients with bladder and kidney cancer: a systematic review. Br J Gen Pract2022; 72:e11–8. doi:10.3399/BJGP.2021.0319•Google Scholar
Funston G, Hardy V, Abel G, et al. Identifying Ovarian Cancer in Symptomatic Women: A Systematic Review of Clinical Tools. Cancers (Basel)2020; 12. doi:10.3390/cancers12123686•Google Scholar
Hippisley-Cox J, Coupland C. Symptoms and risk factors to identify women with suspected cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract2013; 63:e11–21. doi:10.3399/bjgp13X660733•Google Scholar
Hippisley-Cox J, Coupland C. Symptoms and risk factors to identify men with suspected cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract2013; 63:e1–10. doi:10.3399/bjgp13X660724•Google Scholar
Putter H, Fiocco M, Geskus RB, et al. Tutorial in biostatistics: competing risks and multi-state models. Stat Med2007; 26:2389–430. doi:10.1002/sim.2712•Google Scholar