Pitfalls in surveillance for hepatocellular carcinoma: How successful is it in the real world?
Article information
Abstract
Background/Aims
Surveillance for hepatocellular carcinoma (HCC) with ultrasound in high-risk populations is generally believed to improve opportunities for treatment. However, tumors are still missed due to various factors. This study explores success versus failure of HCC surveillance.
Methods
This is a retrospective study of 1,125 HCC cases. Categories considered for successful detection were largest tumor ≤3.0 cm, single tumors ≤3.0 cm and ≤2.0 cm, and adherence to Milan criteria. Examined factors were age <60 years, gender, rural residence, body-mass index (BMI), hepatitis infection, smoking, diabetes, hyperlipidemia, cirrhosis, ascites, and Model for End-Stage Liver Disease <10.
Results
HCC was found on surveillance in 257 patients with a mean tumor size of 3.17 cm; multiple tumors were seen in 28% of cases, bilateral tumors in 7.4%, and vascular invasion in 3.7%. Surveillance was successful in 61.5% of cases involving a largest tumor ≤3.0 cm, with BMI ≥35 negatively affecting detection (odds ratio [OR] 0.28, P=0.014) and cirrhosis positively affecting detection (OR 2.31, P=0.036). Ultrasound detected 19.1% of single tumors ≤2.0 cm with ascites improving the detection rate (OR 3.89, P=0.001). Finally, adherence to Milan criteria occurred in 75.1% of cases, revealing negative associations with diabetes (OR 0.48, P=0.044 and male gender (OR 0.49, P=0.08).
Conclusions
Although surveillance is recommended for HCC, not all surveillance ultrasound are ideal. Tumor detection can depend on gender, BMI, diabetes, cirrhosis, and ascites and is achieved in 19.1–75% of cases depending on the definition of success. Closer follow-up or additional imaging might be necessary for some patient subgroups.
INTRODUCTION
Hepatocellular carcinoma (HCC) is the second most common cause of cancer death worldwide and the incidence and mortality of this cancer continue to increase in the United States [1,2]. In the United States, liver cancer death rates are rising faster than that of any other cancer site [2]. Most of these cancers were historically attributed to viral Hepatitis B and C, but the epidemic of diabetes, obesity, metabolic syndrome and fat-related liver disease will likely sustain the prevalence of this cancer in the future [2]. Unfortunately, many patients with HCC present at an advanced stage or with poor underlying liver function and will not qualify for curative treatments. While survival with HCC is modestly improving, the overall 5-year survival still remains quite dismal at 17.8% [3]. Early identification of liver cancer is the key to improved survival. In addition to a large, randomized prospective study demonstrated the benefit of surveillance for HCC, numerous retrospective studies have reported prolonged survival in those patients who had their cancer found with surveillance [4,5]. Despite the limited rigorous data, surveillance for HCC has become an accepted practice by most hepatologists [6]. Current guidelines by various gastroenterology and hepatology societies recommend surveillance of high risk groups using ultrasound (US) with the goal of detecting small lesions while curative options are still possible [7,8].
While US has become the established standard, there are wide variations in the quality of sonography that is performed [9]. US is highly operator dependent and requires patient cooperation for optimal studies. Patient factors such as body habitus and ultrasound equipment including probes, resolution, and equipment all contribute to the quality of the study. Specialized centers and transplant centers who frequently perform liver specific-US may have different quality from sonography units in small hospitals and private imaging centers [10]. Finally, recent advances in US may have improved the detection rate of small HCC when compared to technology used to establish the surveillance guidelines.
In this study, we examined a large prospectively collected database to assess the effectiveness of US surveillance to detect early HCC. As there is no established definition for successful US surveillance, we have used several criteria as follows: (1) Milan Criteria, (2) Largest tumor <3 cm, (3) Single tumor <3 cm, and (4) Single tumor <2 cm and also attempt to identify factors that impact upon successful surveillance.
PATIENTS AND METHODS
This is a retrospective analysis of 1,125 HCC cases referred over a 24-year period (1993-2016) to our group of physicians who are associated with only liver transplant program in Hawaii and the only referral center for liver disease/surgery for American territories of the Pacific Basin (including Samoa, Guam, Saipan, and the Marshall Islands). This clinic and the transplant center were initially affiliated with Hawaii Medical Center-East (formerly St. Francis Medical Center) and after 2012, the Queens Medical Center. This center sees about 60-70% of the HCC cases in Hawaii.
In the early years, HCC was diagnosed histologically by percutaneous biopsy or at surgery. In the first decade and consistent with the previous United Network for Organ Sharing policy regarding transplant for HCC, patients without histologic confirmation were included if they had a history of chronic liver disease and a mass at least 2 cm in size seen on two imaging studies (ultrasound, CT scan or MRI) and one of the following: (1) vascular blush seen on CT scan or MRI, (2) Alpha-feto protein (AFP) >200 ng/mL or (3) arteriogram confirming the tumor [11]. More recently, the diagnosis of HCC was made with only imaging if a contrast-enhanced study (dynamic CT or MRI) showed typical arterial enhancement with “washout” in the venous phase as described by the American Association for the Study of Liver Disease guidelines [7,8].
Identifying data were removed prior to abstraction by study investigators. Information on demographics, medical history, laboratory results, tumor characteristics, treatment, and survival was collected via clinical interview. Demographic data included age, sex, birthplace, and the patient’s self-reported ethnicity. Ethnicity was then categorized as “White”, “Asian”, or “Pacific Islander.” Patients who did not fit into one of these categories or were of mixed ethnicity were subsequently classified as “Other.” Patients of mixed race with at least 50% Pacific Islander ethnicity were categorized as “Pacific Islander”. As a surrogate for urban versus rural location of surveillance ultrasound performed, we used Oahu (island with largest population, Honolulu and the larger tertiary referral centers) vs non-Oahu (Big Island, Maui, Kauai, Molokai and Lanai). Data collected on medical history included diabetes mellitus, hyperlipidemia, smoking, and risk factors for HCC including viral hepatitis, alcohol abuse (defined as greater than two alcoholic beverages daily for at least ten years), and other chronic liver diseases. Patients were also classified as having hyperlipidemia if lipid-lowering agents were present on their medication list. Information was based on available medical records and interview by a single physician. Measured height and weight were used to determine body mass index (BMI).
Laboratory data collected included bilirubin, albumin, prothrombin time, creatinine, alanine aminotransferase, aspartate aminotransferase, platelet count, and AFP. Laboratory data that was used for the study had been obtained within 2 weeks of initial visit. Bilirubin, prothrombin time with international normalized ratio and creatinine were used to calculate the Model for End-stage Liver Disease (MELD) score. Hepatitis B positivity included those with who had a positive hepatitis B surface antigen and hepatitis C positivity included those who had a positive hepatitis C antibody, irrespective of viral RNA level. The size, number, and location of the tumor(s) was used to determine the Tumor Node Metastases (TNM) stage according to the American Joint Commission on Cancer staging manual [12].
The proportion of patients with HCC detected by surveillance was noted. Although our Liver Center recommends that community physicians perform surveillance US and AFP every 6 months in patients with viral hepatitis and chronic liver disease who meet American Association for the Study of Liver Diseases (AASLD) criteria for surveillance, there was no uniform protocol used in the cohort. Referring physicians used a combination of AFP and/or US at variable intervals. As surveillance intervals differed amongst patients due to a number of factors including lack of physician awareness/adherence of current surveillance guidelines, and patient compliance, we included patients with US-detected HCC with a history of at least one negative ultrasound prior to detection. Patients who were symptomatic at initial presentation were excluded from this study.
HCC was deemed to be found on “surveillance” if the patient had a previous US from three to twelve months prior, showing no liver masses. Patients with surveillance intervals less than 3 months or greater than 12 months were excluded from this study. In cases where previous US reports were unavailable, we used physicians notes or presenting US reports indicating a previous US investigation. Initial detection of HCC by CT or MRI was not considered surveillance, and these patients were excluded from this study. Additionally, cases of incidental HCC found on the explanted liver at transplant were also excluded (Fig. 1).
Since there is no standardized definition for successful surveillance, we used several potential definitions including detection of the following lesions: Largest tumor ≤3.0 cm, Single tumor ≤3.0 cm, Single tumor ≤2.0 cm and HCC meeting Milan criteria (single tumor <5 cm or 2-3 tumors all <3.0 cm). For each of these groups, we compared categorical values of age <60 years vs. ≥60 years, male vs. female, urban vs. non-urban, hepatitis B positive vs. negative, hepatitis C positive vs. negative, BMI ≥25 vs. <25 kg/m2, BMI ≥30 vs. <30 kg/m2, BMI ≥35 vs. <35 kg/m2, smoking vs. non-smoking, diabetes vs. no diabetes, hyperlipidemia vs. no hyperlipidemia, presence of ascites vs. no ascites, presence of cirrhosis vs. no cirrhosis, and MELD ≥10 vs. <10. Data analysis was performed with SPSS Statistics (version 24, IBM Corp., Armonk, NY, USA). Multiple logistic regression was used to calculate odds ratios and 95% confidence intervals. Statistical significance was determined to be a P-value of <0.05.
This study was approved by the University of Hawaii Institutional Review Board. This study was conducted in accordance with institutional and federal regulations for the protection of human subjects.
RESULTS
Patient characteristics
Of the 1,125 patients with HCC, 865 had diagnoses which prompted HCC surveillance, including hepatitis B, hepatitis C, alcoholic cirrhosis, or cirrhosis due to autoimmune hepatitis or non-alcoholic steatohepatitis (NASH). Of this group, 271 (31.3%) patients underwent appropriate ultrasound surveillance. Fourteen patients were excluded due to incomplete height and weight data used to calculate BMI. The mean age of 257 patients within this study cohort was 62 years (standard deviation [SD] ±9.4) and 69.6% were males. Sixty-eight patients resided and were referred from practices on rural neighbor islands, whereas 189 were followed on Oahu. More than half of the patients were Asian. With regards to disease etiology: 119 patients (61.9%) had hepatitis C, 78 patients (30.3%) had hepatitis B, 8 patients (3.1%) were co-infected with hepatitis B and C, 16 patients (6.2%) had NASH, 12 patients (4.7%) had alcoholic cirrhosis without viral risk factors, and 3 patients (1.2%) had autoimmune hepatitis. Mean BMI was 26.8 (range 16.0–49.8, SD 5.55). Other comorbidities included hypertension in 126 patients (49.0%), diabetes in 88 patients (34.2%), and hyperlipidemia in 44 patients (17.1%). MELD score was less than 10 in 142 patients (55.3%) and 118 patients (45.9%) had AFP less than 20 ng/mL. Ascites was present in 51 (19.8%) patients with 2 of these having intractable ascites requiring frequent paracentesis. Please see Table 1 for further details.
Tumor characteristics
Patients in this cohort presented upon surveillance with a mean largest tumor size of 3.14 cm (range 0.7-14 cm, SD 1.83). Figure 2 shows the distribution of the largest tumor identified with 27 patients presenting with tumors greater than 5.0 cm, the maximum size allowed for single lesions under Milan Criteria. Multiple tumors were present in 72 patients (28.0%) and in this group, the largest tumor had a mean size of 3.0 cm (SD 1.59 cm). Nineteen patients presented with more than 3 tumors, and 2 patients presented with diffuse HCC. Twenty patients (8.7%) had bilateral liver involvement. Evidence of vascular invasion was present in 9 cases (3.5%), and there were no cases of metastatic disease or tumor rupture.
Detection vs. Surveillance failure
Four definitions of successful detection were evaluated in this study: (1) meets Milan Criteria (2) largest tumor ≤3.0 cm, (3) single tumor ≤3.0 cm, (4) single tumor ≤2.0 cm.
Of the patients who had HCC found on surveillance, 193 (75.1%) patients met Milan Criteria. In this group, male gender was associated with poor detection as 52 of 64 male patients exceeding Milan Criteria were of male sex (OR 0.44, 95% CI 0.22-0.89; P =0.02). Positive smoking history was also associated with poor detection within Milan Criteria with OR 0.49 (95% CI 0.26-0.92; P =0.02) (Table 2). After multiple logistic regression analysis, male sex remained associated with poor detection within Milan Criteria, however significance was marginal (OR 0.49, 95% CI 0.21-1.06; P =0.080). Diabetes was also revealed to be associated with poor detection with Milan Criteria (OR 0.48, 95% CI 0.24- 0.98; P =0.044). No other variables were found to have statistical significance (Table 3).
If the largest tumor was ≤3.0 cm used as the criteria, then 158 (61.5%) patients had successful surveillance. BMI ≥35 was associated with poor detection with Largest Tumor ≤3 cm (OR 0.28, 95% CI 0.10-0.76, P =0.014). However, the presence of cirrhosis correlated with improved detection with a Largest Tumor ≤3 cm (OR 2.31, 95% CI 1.06-5.13; P =0.036). No other variables were found to have statistical significance (Table 2, 3).
Of the entire cohort, 121 patients presented with a single tumor ≤3.0 cm resulting in a detection rate of 47.1%. None of the variables evaluated were associated with successful surveillance, even after performing multiple logistic regression analysis (Table 4).
If single tumor ≤2.0 cm is the proposed criteria for successful surveillance, then 49 (19.1%) patients had success. In this group, BMI ≥25 and <30 was associated with poor detection of single tumors <2 cm (OR 0.45, 95% CI 0.20-0.98; P =0.050). However, the presence of ascites was associated with improved detection even after accounting for interactions using multiple logistic regression analsyis (OR 2.70, 95% CI 1.35-5.41; P =0.004). See Tables 3 and 4 for details.
DISCUSSION
Multiple retrospective studies and one large randomized controlled trial have demonstrated the value of US surveillance in improving survival in HCC [4,13-15]. Even after adjusting for lead-time bias, multiple studies have confirmed a survival benefit with surveillance [5,16-20]. Studies have addressed the accuracy of US with a meta-analysis of 6 studies showing the pooled sensitivity and specificity for detection of HCC was 95% and 91% respectively [21]. However, few studies have addressed the effectiveness of US surveillance in the detection of a specific size or number of tumors. If the goal of surveillance is to find HCC that would qualify for potentially curative therapy, it is imperative that we not only find HCC at surveillance but also find smaller tumors. Diagnosing advanced tumors with surveillance might infer that there was a failure in the previous study to detect HCC.
Initial detection of HCC through CT, MRI, or via pathological diagnosis poses a vulnerability to lead-time bias, which may inflate failure rates of surveillance ultrasound. This is particularly apparent when smaller tumors are detected on CT/MRI, which would otherwise be missed on ultrasound imaging. To reduce lead-time bias, patients with HCC detected through these other methods were excluded from this study.
This study proposed 4 possible criteria with US success rates of 19.1-75.1% based on various criteria delineated. Although 75% of patients met Milan criteria at detection, when challenged to detect single tumors smaller than 3.0 cm, US was successful less than 50% of the time. In one of the few studies exploring surveillance failure, Del Poggio, et al found that 81% of 1,170 patients with HCC met Milan criteria upon detection with surveillance. Dynamic imaging was subsequently performed and only 65.6% of this cohort actually met Milan criteria, thus yielding a 34.3% failure rate of surveillance after complete staging. Sex, surveillance interval, Child-Pugh class and AFP >200 ng/mL were associated with surveillance failure. Similar to our study, only 20% of their cohort had a tumor less than 2 cm found with surveillance and the etiology of cirrhosis did not affect surveillance failure [22].
Obesity may affect the accuracy of surveillance US for HCC. In this study morbidly obese patients with BMI ≥35 fared worse in detection of HCC with largest tumor ≤3 cm when compared to a reference population of BMI <25. Del Poggio, et al found that BMI >25 was associated with surveillance failure but only one fifth of their patients had BMI reported [22]. In the Hepatitis C Antiviral Long-Term Treatment Against Cirrhosis (HALT-C) trial which investigated long-term outcome of patients who received interferon-based therapy for hepatitis C, rates of HCC detection did not differ between those with BMI ≤30 (64.4%) compared to those with BMI >30 (52.6%) in the 1,005 patients who underwent surveillance [23]. Obesity and coarse texture of the liver pose challenges for ultrasound due to limited penetration of sonographic beams and increased attenuation resulting in poor image quality [24]. MRI or CT may be more appropriate for HCC surveillance in these patients due to improved image resolution and superior sensitivity [25]. Patients with BMI ≥25 and <30 had poorer detection of single tumors ≤2.0 cm. However, this result was based on 38 cases and with significance just being met a P =0.05. Conflicting results with respect to BMI and detection, may suggest that BMI may not be the optimal measurement to assess obesity. Thickened adipose tissue in the abdominal wall or truncal obesity would present great difficulty for the ultrasonographer. Increased waist circumference may be the more appropriate indicator for suboptimal ultrasound as this would take into account differences in body habitus, age, gender and ethnicity [26].
Diabetes may also represent an independent risk factor in the development of HCC as discussed by Davila, et al. This study compared HCC cases with non-cancer controls and demonstrated a 2 to 3-fold increase in HCC incidence in diabetic patients after accounting for demographics and risk factors [27]. Our study showed a similar trend, but instead examined HCC patients only and compared performance, with diabetic patients having poor detection of tumors within Milan Criteria. This may be related to the higher incidence of obesity resulting in poor detection using surveillance ultrasound. Despite this association, it remains unclear whether diabetes should be factored when considering alternatives to current surveillance recommendations. More investigation is necessary to further characterize the effects of diabetes on the success of ultrasound surveillance.
Gender differences may also affect the success of surveillance US. It is well described that women are more likely to attend health events and undergo cancer screening so compliance with surveillance would be the initial step [28,29]. In this cohort, the proportion of males who underwent surveillance is similar to the overall cohort and the gender distribution of this cancer. In our study population, males did have a decreased chance of finding HCC that met Milan criteria based on Chi-square analysis. However, this only remained marginally significant after accounting for other predictive factors through multiple logistic regression. Although the reasons for this are not clear, this is similar to the study by Del Poggio in which male gender was associated with surveillance failure [22]. Gender differences may also be related to BMI, body habitus and adipose tissue distribution, which would need to be explored further.
Patients with ascites were more likely to be identified with single tumors ≤2.0 cm. This may be due to the fluid-solid surface interface, which improves clarity of lesions near the liver surface that would otherwise be obscured by surrounding structures. Interventional radiologists frequently use artificial ascites to better define difficult liver lesions for radiofrequency ablation and this technique has been successful in >90% of cases [30,31]. However, our study is the first to demonstrate the advantage of ascites in detecting small tumors during HCC surveillance.
Several studies have examined ultrasound surveillance in cirrhotic patients, but few have evaluated cirrhosis as a risk factor for surveillance failure. In this study, the presence of moderate or severe cirrhosis was associated with improved detection of cases with largest tumor ≤3.0 cm, after accounting for demographics and other risk factors. The sensitivity to detect these cases however, was similar to the pooled sensitivity of 63% as described in a meta-analysis by Singal, et al, evaluating ultrasound surveillance in cirrhotic patients [21]. As cirrhotic patients already comprise the majority of the at-risk population, it is unclear whether the presence of cirrhosis should impact surveillance.
Failure of US-based surveillance is frequently deemed “operator-dependent” with various implications regarding the experience of the technician or site of the study, i.e. large tertiary/academic center with liver expertise versus a community or private imaging unit [10,32]. We have attempted to address this by using Oahu versus non-Oahu as a proxy for an urban versus rural setting for performance of ultrasound. In Hawaii, 75% of the population and the largest medical centers are located on Oahu. The non-Oahu islands have smaller community hospitals and gastroenterologists, but no specialized centers or hepatologists. Despite these differences, we demonstrated no significant difference in US detection in these two locations.
This study is limited in that it was retrospective and there was no uniform surveillance protocol utilized. US was performed by multiple hospital-based and private imaging centers likely with variable experience and expertise. Although 6 months was the recommended interval, many patients had studies performed within 6-11 months likely due to scheduling, convenience and missed appointments. Many of these studies were performed by technicians who select certain images to be reviewed by the radiologist who is reporting the official interpretation. In these settings, there may be little comment on the quality of the study or particular difficulties encountered in the report. This study also did not address the specific tumor location, as tumors at the dome of the liver near the diaphragm may be more difficult to detect. Although AFP is not universally recommended as a component of surveillance, AFP performed within 2 weeks of the time of surveillance was included in this study. Due to the variable nature of community application of surveillance recommendations, prior AFP values were not consistently available. We would have hoped that an elevated AFP from earlier surveillance US would have prompted additional imaging but this study could not determine this.
Despite the lack of a standard protocol for ultrasound surveillance, each patient in this study underwent ultrasonography in the 12 months prior to the study that diagnosed HCC, which reflects the real world application of ultrasound surveillance. These studies were often ordered by community physicians, performed in multiple settings and with imperfect patient compliance with a 6-month regimen. Unlike previous studies, our study was also able to address the possibility of differences between US performed in an urban versus rural setting as demonstrated in our comparison of an Oahu versus non-Oahu location. Furthermore, data was prospectively collected over a period of 23 years by the only referral center for liver disease in the Pacific Basin and included a large number of patients. This study also sampled an ethnically diverse population, which contrasts other studies that sampled predominantly Caucasian or Asian populations.
In conclusion, this study demonstrates that surveillance with US appears to be reasonably effective at detecting HCC in patients who have tumors that meet Milan criteria but may be less reliable in small tumors, males and the morbidly obese and better in patients with ascites. Other factors such as age, etiology of disease, diabetes, smoking and location of the ultrasound center did not affect the ability of US to detect HCC. US is likely sufficient for patients with ascites who are awaiting liver transplant but other strategies may be necessary to adequately detect HCC in patients with obesity. Finally, this study identifies potential areas for improvement and future study of HCC surveillance. Perhaps rather than accept the repeated claim that “Ultrasound is operator-dependent”, we should refer only the optimal candidates for US surveillance and develop a metric for determining the quality of the US at a particular center based on their ability to successfully identify HCC with surveillance.
Abbreviations
AFP
alpha-fetoprotein
BMI
body-mass index
CT
computed tomography
HCC
hepatocellular carcinoma
MELD
Model for End-Stage Liver Disease
MRI
magnetic resonance imaging
NASH
non-alcoholic steatohepatitis
OR
odds ratio
SD
standard deviation
US
ultrasound.
Notes
Authors’ contribution
Wong LL: Conducting the study, collecting data, drafting the manuscript.
Reyes R: Data interpretation, drafting the manuscript.
Kwee S: Drafting the manuscript.
Hernandez BY: Interpreting the data, drafting the manuscript.
Kalathil S: Drafting the manuscript.
Tsai N: Drafting the manuscript.
Funding support
Biostatistics support was funded by the RMATRIX-II Grant Award #U54MD007584.
Conflicts of Interest: The authors have no conflicts to disclose.
Acknowledgements
We would like to acknowledge So Yung Choi and the John A. Burns School of Medicine Biostatistics Core for their role in biostatistical analysis.