Editorial

Making administrative healthcare systems clinical data the future of clinical trials: lessons from BladderPath

Clinical trials in recent years have become more challenging and costly to run. COVID-19 has revealed the power of huge driving forces behind research studies and how new and alternative ways of working are major agents behind this success. For over a decade, we have been investigating the use of routinely collected hospital administrative data in randomised controlled trials (RCTs). In this editorial, we share our practical experiences with such data for our bladder cancer trial (BladderPath). We aim to transparently report our journey for the benefit of future trialists working with such data, to aid future studies in successfully acquiring and using this data goldmine.

Conventional processes of collecting RCT data for outcomes are expensive, laborious and sometimes inefficient, using invaluable National Health Service (NHS) resources. This can limit follow-up data collection and important outcomes can be unreportable or unreliable. Hence, using routinely collected data to identify outcomes may permit more efficient RCT designs, be less burdensome for patients and healthcare providers, be more cost-effective, enable the collection of longer-term outcomes and divert scarce resources to priority needs. This is long awaited, and many groups1 2 are investigating the use of these vast data which are described in detail elsewhere.3–5 One example, the Hospital Episode Statistics (HES), code every inpatient, outpatient and emergency visit to NHS England hospitals6 with fields including diagnoses and procedure codes.

We previously described our BladderPath team’s development and validation of novel methods to solely use routinely collected data to replace conventional follow-up to populate case report forms (CRFs).3 4 BladderPath (Image Directed Redesign of Bladder Cancer Treatment Pathway, ISRCTN35296862) compares two diagnostic pathways for bladder cancer.7 Our rationale included: data providers periodically sending the trial team data for processing, these extracts would automatically prepopulate CRFs to be sent for site verification and upload into the central trial database.

Literature highlights administrative data benefits and pitfalls, including accuracy.8–10 However, it is also recognised that traditional data collection already yields imperfect outcome data.11 For our validation, we compared NHS routinely collected data to reference clinical datasets and identified substantial sensitivity improvements (example events: surgery, radiotherapy and chemotherapy).3 In 2011, 41/117 (35%) events were detected, compared with 104/109 (95%) in 2017, with 95% (657/692) sensitivity over the last five data years.3 Despite being a single site validation, remuneration is driving central and local initiatives which we hypothesise is driving further improvements in coding accuracy nationally.3 We proposed manually querying all administrative data derived events against local clinical notes for further validation in the BladderPath trial.3 With this approach, we intended to set up a framework with data providers to continually return quality measures to enhance these data, removing future need for data queries. We proposed using multiple datasets to reduce missingness and algorithm rules to capture miscoded events.3 We considered our design would address pre-empted data missingness, outcome availability, governance, data retention, privacy and security concerns.8 Using BladderPath as a case study we set out to implement our schema and below we share our experiences.

During BladderPath setup, two data providers were contacted: NHS Digital (NHSD) (April 2017) and the National Cancer Registration and Analysis Service (NCRAS) (July 2017). NCRAS was previously run by Public Health England (PHE) which became part of NHSD in October 2021 and is now known as NHS England due to merging in February 2023. We sought monthly participant linked HES, mortality and diagnostic imaging data from NHSD and radiotherapy, chemotherapy and cancer registration data6 from PHE.

Initially, we requested monthly HES and mortality data for assessment of rapid outcomes (at that time unaware that NHSD provided data with a 2-month lag). A data linkage fee for every data drop was quoted (£2060) in addition to set up fees, costing £4910 per month; £58 920 per year and £589 200 for 10-year study follow-up. This substantially exceeded trial budgets and did not include the necessary central staff time for data processing.

We subsequently approached PHE regarding HES access, in addition to radiotherapy, chemotherapy and cancer registration data. NCRAS through the Office for Data Release were able to provide these datasets affordably without charging multiple linkage fees. However, discussions revealed that neither monthly nor quarterly HES releases were feasible for technical and operational reasons, instead offering us 6 monthly provision (minimum). This was unfortunately insufficient due to our requirement for rapid access to outcomes.

Our experience of working with healthcare systems data in England was currently too cumbersome and expensive for practical and economic implementation into RCT methodology12: such data have to be made more affordable with timely access. In the meantime, trialists should be aware of the need to allocate substantial budgets for data on grant application and to be satisfied that real-time access to outcomes data is not yet possible. However, the future may hold more promise. An alternative new and more accessible dataset, the Rapid Cancer Registration Dataset,6 is now available. This can signal specific treatment events from January 2018 but with a smaller set of variables. Data providers have also been undergoing reform, with the NHSD DigiTrials service set up to support trialists.6 In addition, the Secretary of State for Health and Social Care were tasked to find ways to ‘deliver better, broader, safer use of NHS data for analysis and research’ suggesting the use of Trusted Research Environments or Secure Data Environments.13 We remain hopeful that these initiatives improve access and timeliness.

Our negative experiences of acquisition of these data have been offset by a positive experience for one-off retrospective data collection for our prostate cancer trial (STAMPEDE),14 despite not yet translating into a simple approach for long-term data.4 NCRAS delivered a service with allocated, supportive and helpful team members. We recommend that future trialists initiate such dialogue before the grant application stage, to ensure that the lengthy data application processes can be completed prior to recruitment. A new combined application system, such as that recently adopted by the Integrated Research Application System (IRAS), or even better, enabling application through the IRAS system when setting up a clinical trial, may enhance and simplify the application process. This would enable a single application for all datasets.

The success of COVID-19 trials like RECOVERY,15 who used such data, emphasises the huge public value of well-conducted research and the power of NHS trials. Globally, these lessons are broadly applicable by using equivalent datasets. Nordic datasets are extensive, whereby a unique personal identity number is assigned at birth/immigration which tracks healthcare and other interactions. Due to these extensive datasets the methods that we outline above may be strengthened. A unique assigned number would potentially be of huge research value in England.

Our experiences show that administrative data can be repurposed to collect trial outcomes, however, the caveats described above need to be considered. BladderPath has now closed recruitment and on agreement with NHSD, we plan to use these data for a follow-up data sweep, bypassing the costs and timeliness concerns seen with real-time acquisition. With the newly available services and reforms, we are optimistic that this huge resource can be more widely used to benefit future research. We continue working to drive trial conduct to the forefront of technology.