Discussion
We found a high prevalence of CIN2+precancerous lesions and hrHPV in WLHIV, almost all of whom were on ART. Stand-alone hrHPV, Gynocular and VIA testing missed almost a quarter of precancerous disease. Among visual screening tests, the Gynocular performed better than VIA. Combining tests did not improve test accuracy measures. In a sensitivity analysis in which only CIN2+ detected from visible lesions was used as the reference standard, all accuracy measures improved.
The study has several strengths. First, we tested a novel magnification device (Gynocular) among WLHIV with limited access to conventional colposcopy. Second, the index tests and reference standards were relevant to the context and performed by local experts. Third, we optimised the study methods with several strategies. Local and international experts contributed to protocol development and training staff. A data safety and monitoring board provided oversight.9 All women received the reference standard, preventing partial verification biases. We reduced detection bias by obtaining 2–4 biopsies from each woman and considering the presence of disease at two time points 6 months apart. We used objective measures of HIV severity and concurrent T. vaginalis to examine associations between coexisting conditions and test performance.15 16 We safeguarded blinding of screening tests and the reference standard. Furthermore, p16 immunostaining was used to determine HSIL objectively.11 17 Because screening results often include indeterminate and missing results, we included a sensitivity analysis to understand the impact of these on test accuracy.
We acknowledge the limitations of our study methods. First, we used an index test (Gynocular) to guide biopsy samples for the reference standard. However, partial verification bias was avoided because all women received multiple biopsies irrespective of whether a lesion was seen.18 19 Second, the COVID-19 pandemic interrupted follow-up, and only 104 (28%) women had a second reference test by the time the study had to close. We found five additional cases of CIN2+ among these, presumably missed at baseline. Were we able to complete follow-up on all women, disease prevalence may have been higher, affecting the predictive values of the tests performed at baseline.20 Third, while we considered 6 months a short enough interval for the second reference standard test to detect missed disease, a 12-week time frame has also been used in previous studies.7 Fourth, we used GeneXpert as the hrHPV testing platform, but an additional laboratory-based method would have enhanced quality control. Fifth, the study assessed VIA, but many sites in SSA use an amended method, including cervicography.21 22 The results of this study are, therefore, not applicable to the Cervical Cancer Prevention Programme in Zambia.
The sensitivity of testing for hrHPV was lower in our study than in many others.6 7 In contrast to many previous studies, we took four biopsies from women with no visible lesions and repeated testing 6 months later to avoid partial verification bias when only acetowhite lesions are sampled. We found the sensitivity of hrHPV was 65.3% (95% CI 59.4% to 70.7%) when biopsies were obtained from all women and 85.7% (95% CI 73.3% to 92.9%) if only biopsies from visible lesions were considered. Kelly et al’s systematic review of cervical cancer screening strategies among WLHIV in studies published up to July 2022 found that the sensitivity of VIA was overestimated in studies with a risk of partial verification bias.7 They did not, however, do a subgroup analysis stratified by the risk of verification bias for hrHPV testing. Studies in which the reference standard is obtained only from visible lesions during colposcopy22 23 have higher estimates of sensitivity and specificity than when all women have biopsies.6 24 25 We also found a prevalence of precancer among WLHIV that was higher than in another Zambian study, in which CIN2+ prevalence was 16% among 200 women screened at the University Teaching Hospital in 2016.8 A systematic review evaluating diagnostic accuracy of cervical cancer screening strategies among WLHIV found a pooled prevalence of 12% (range 2%–26%),9 with higher prevalence in tertiary settings where referral for abnormal cervical smear or positive HPV test suggested a high risk for CIN2+. Our reference standard methods, taking 2–4 biopsies at two time points, might have detected more CIN2+ cases than in studies taking one biopsy from the most severe cervical lesion26 27 or a maximum of two biopsies.6 7 Wentzensen et al20 found that sensitivities for detecting CIN2+ increased from 61% (95% CI 55% to 67%) in a single biopsy to 86% (95% CI 80% to 90%) with two biopsies to 96% (95% CI 91% to 99%) with three biopsies.27 In contrast to previous studies that calculated combined test accuracy using the denominator of women testing positive from the first test, we considered all women in our denominator so as not to miss any disease in the target population. This better emulates a real-life situation highlighting that combining tests does not improve accuracy when the sensitivity of the primary screening test is low. In different contexts, the choice of screening tests that prioritise sensitivity or specificity may vary depending on the resources and infrastructure available.28 29 For example, if there is already a system in place to ensure women receive timely follow-up and treatment, providers can prioritise a test with lower sensitivity and higher specificity, to avoid unnecessary treatments. However, if this infrastructure is not available, a test that prioritises high sensitivity and enables a point-of-care strategy to link screening and treatment, may be preferred to ensure that fewer women with the potential to develop cervical cancer are missed. Ideally, a screening sequence should aim for a sensitivity of 90%–95% and specificity of 85% to detect CIN3+ during one screening interval.30 Although p16-positivity indicates a higher cancer potential than CIN2+alone, our results cannot be directly compared with this target. Larger test accuracy studies among WLHIV which minimise bias, would strengthen estimates of accuracy and enable improve the healthcare for women.
In our study, hrHPV testing, Gynocular colposcopy and VIA performed poorly as stand-alone screening tests among WLHIV, and 22.9% of cases were not detected by any test. Combining two tests did improve specificity but not overall accuracy when all women (and all disease) were considered in the denominator. Our findings have implications for research and cervical cancer screening policies among WLHIV if test accuracy in this high-risk population has been overestimated. According to our sensitivity analysis, the assumption that taking biopsies from visible lesions on colposcopy is an acceptable reference standard might need reassessment. WHO recommends 3–5 years screening intervals for WLHIV, based on the assumption that suboptimal screening tests at sufficiently frequent intervals will still prevent cancer because of the long precancerous phase. However, if accuracy measures informing modelling studies are overestimated, these screening intervals might be too long. Larger studies, among WLHIV, in countries with the highest disease burden and using methods that reduce verification bias are urgently required. Our robust descriptive study results can be used in future modelling studies and randomised controlled trials of screening effectiveness, both of which are needed to determine improved strategies for cervical cancer screening among WLHIV.