Conventional and machine learning-based risk scores for patients with early-stage hepatocellular carcinoma
Article information
Abstract
Background/Aims
The performance of machine learning (ML) in predicting the outcomes of patients with hepatocellular carcinoma (HCC) remains uncertain. We aimed to develop risk scores using conventional methods and ML to categorize early-stage HCC patients into distinct prognostic groups.
Methods
The study retrospectively enrolled 1,411 consecutive treatment-naïve patients with the Barcelona Clinic Liver Cancer (BCLC) stage 0 to A HCC from 2012 to 2021. The patients were randomly divided into a training cohort (n=988) and validation cohort (n=423). Two risk scores (CATS-IF and CATS-INF) were developed to predict overall survival (OS) in the training cohort using the conventional methods (Cox proportional hazards model) and ML-based methods (LASSO Cox regression), respectively. They were then validated and compared in the validation cohort.
Results
In the training cohort, factors for the CATS-IF score were selected by the conventional method, including age, curative treatment, single large HCC, serum creatinine and alpha-fetoprotein levels, fibrosis-4 score, lymphocyte-tomonocyte ratio, and albumin-bilirubin grade. The CATS-INF score, determined by ML-based methods, included the above factors and two additional ones (aspartate aminotransferase and prognostic nutritional index). In the validation cohort, both CATS-IF score and CATS-INF score outperformed other modern prognostic scores in predicting OS, with the CATSINF score having the lowest Akaike information criterion value. A calibration plot exhibited good correlation between predicted and observed outcomes for both scores.
Conclusions
Both the conventional Cox-based CATS-IF score and ML-based CATS-INF score effectively stratified patients with early-stage HCC into distinct prognostic groups, with the CATS-INF score showing slightly superior performance.
Graphical Abstract
INTRODUCTION
Primary liver cancer is the seventh most frequently occurring cancer and the third leading cause of cancer mortality in the world [1]. Hepatocellular carcinoma (HCC) accounts for approximately 75–90% of primary liver cancers [2]. HCC typically develops over an extended period of time, often in the setting of advanced chronic liver diseases (ACLDs), such as chronic hepatitis B virus (HBV) infection, hepatitis C virus (HCV) infection, alcoholism, and metabolic dysfunction-associated steatotic liver disease [3].
The outcomes of HCC patients have improved through advances in the surveillance of high-risk patients, antiviral therapy for chronic HBV or HCV infection, surgical and local regional therapy, and the introduction of systemic therapy [4-6]. One survey from the United States showed that the 5-year overall survival (OS) rates of patients with HCC increased from 3% between 1975 and 1977 to 18% between 2004 and 2010 [7]. Nevertheless, there is still room to improve.
The Barcelona Clinic Liver Cancer (BCLC) clinical algorithm is the most commonly applied system for staging and patient stratification and plays an important role in clinical decision-making, especially for the management of HCC [8]. The BCLC algorithm mainly focuses on tumor burden, liver functional reserve, performance status, vascular invasion, and extrahepatic spread. Early stage HCC vs early-stage HCC (BCLC stage 0-A) are defined by the presence of either a single tumor irrespective of size or 2 to 3 tumors less than 3 cm in size, well-preserved liver function, good performance, and a lack of vascular invasion, extra-hepatic metastasis, and cancer-related symptoms. Patients with BCLC stage 0-A HCC usually have better outcomes compared to those with intermediate or advanced stage HCC, and the median OS time is more than 5 years [8,9]. However, the clinical manifestations and outcomes of patients in stage 0-A HCC are not the same [10,11]. To improve the prognoses of early stage HCC vs early-stage HCC patients, it is warranted to identify the risk factors associated with the outcomes and to adopt individual strategies for treatment and follow-up.
Machine learning (ML) has been described as a “marriage between mathematics and computer science” and has been emerging as a promising method for biomarker selection and prognostic models [12]. Prognostic models based on ML have been developed for several different diseases, such as acute stoke, depression, and malignancy [13-15]. However, ML-based models to predict the outcomes of patients with early stage HCC have not been widely studied. Some discrepancies between conventional and ML-based methods have also been reported in several studies [16]. Therefore, we aimed to develop risk scores using ML and conventional methods to stratify patients with early-stage HCC into distinct prognostic groups.
MATERIALS AND METHODS
Data source and study population
This retrospective cohort study analyzed patient-level data from the HCC registration system [17] at Taipei Veterans General Hospital (TVGH), a major medical center in northern Taiwan. TVGH is a medical center located in northern Taiwan, boasting 3,160 beds. It handles a substantial volume of patient care in Taiwan, with approximately 8,000 outpatients visit per day. This registration system prospectively collects comprehensive data on demographics, etiology, baseline laboratory information, tumor factors, treatments, and outcomes for newly diagnosed HCC patients and has been used in previous studies [9,17-19]. The diagnosis of HCC of the registration system was based on diagnostic criteria of the American Association for the Study of Liver Disease (AASLD) [20]. The study involved 3,832 consecutive treatment-naïve HCC patients from 2012 to 2021 (Fig. 1). Exclusions were made for patients with advanced HCC stages (BCLC stage B to D), missing initial serum biomarker data or diagnostic imaging, or lost to follow-up post-diagnosis. A total of 1,411 patients with BCLC stage 0-A HCC were enrolled. The index date was defined as the date on which HCC was diagnosed. The study assigned 70% of these patients randomly to the training cohort and the remaining 30% to an external validation cohort.
Curative treatments included liver transplantation, surgical resection, and local ablation therapy. Trans-arterial chemoembolization (TACE), molecular target therapy, immune checkpoint inhibitors, radiotherapy, chemotherapy, and best supportive treatment were considered non-curative treatment modalities. Patients were followed until death, loss of follow-up, or the end of the study (June 30, 2022). This study adhered to the Declaration of Helsinki and received approval from the Institutional Review Board (IRB) of Taipei Veterans General Hospital, Taiwan (IRB number: 2022-07-007BC). Informed consent was waived by the IRB since this was a retrospective observational cohort study, and patient information was de-identified before the study commenced.
Outcome measurement
The vital status of each patient was collected from the electronic health record and linked to the registry data. The primary outcome was OS, which was calculated from the index date to the date of death or the last date of follow-up.
Construction and validation of prediction model and risk score
To construct a novel prediction model, the study initially utilized univariate Cox proportional hazard models to identify significant variables for OS. These variables were then integrated into a multivariable Cox model, refined through stepwise selection based on the Akaike Information Criterion (AIC), and verified by residual analysis. The detailed methods for model construction and validation are provided in Supplementary Material.
For each prognostic factor, a Cox-based risk score was calculated and standardized on a 0–100 scale. The cumulative risk score for each patient, derived from the sum of individual factor scores, was segmented into high, medium, and low-risk categories using the 33rd and 66th percentiles for effective risk stratification.
We employed the least absolute shrinkage and selection operator (LASSO) method to overcome the common challenges of multicollinearity and overfitting in complex models. This led to the creation of a ML-based risk score utilizing the variables retained in the LASSO-based Cox model. The effectiveness of this ML-based risk score, compared to a standard Cox regression model, wasthen evaluated.
Validations of Cox-based and ML-based risk scores were conducted in training and validation cohorts. This encompassed a three-step process: assessing predictive performance for OS through comparisons of homogeneity, AIC, and AUROC; generating calibration plots for predicted versus observed survival; and implementing time-dependent ROC curve analysis to evaluate prognostic performance at 1, 2, 3, and 5 years post-diagnosis, acknowledging the dynamic nature of disease status and survival time.
Statistical analysis
The baseline characteristics, including demographics, treatments, tumor factors, viral hepatitis status, and HCC-related biomarkers, were collected. The ROCs and the Youden index were utilized to determine the optimal cutoff values for non-invasive serum marker scores in predicting the risk of mortality in patients with early-stage HCC.
Continuous variables were presented as the median and interquartile range (IQR) and compared using the Mann–Whitney U test. Categorical variables were expressed as frequencies and percentages and compared using the chi-squared test or Fisher’s exact test. Cumulative OS rates were estimated using the Kaplan-Meier method and compared using the Cox proportional hazards model. All statistical analyses were conducted using SPSS version 24.0 (IBM Corp., Armonk, NY, USA) and R software (version 4.2.3) (R Foundation for Statistical Computing, Vienna, Austria). SPSS was employed for the forward stepwise Cox regression, while R and the “glmnet” package were used for the LASSO Cox regression and subsequent plots, and the “timeROC” package was used to generate the time-dependent ROC curves. A two-tailed P-value of <0.05 was considered statistically significant.
RESULTS
Basic characteristics
Among the 1,411 patients with BCLC stage 0-A HCC, 1276 underwent curative treatments: 830 had surgical resection, 5 underwent liver transplantation, 425 received radiofrequency ablation therapy, and 16 were treated with percutaneous ethanol injection therapy. Of the 135 patients who received non-curative therapy, 116 underwent TACE, 17 received radiotherapy, and 2 received best supportive care. The study divided these patients into two cohorts: 988 patients in the training cohort and 423 in the validation cohort. The clinical characteristics of patients are shown in Table 1. In both cohorts, males were the majority, and most of the etiologies were HBV or HCV-related HCC.
Biomarker selection
After a median follow-up of 38.0 months (IQR 18.0–57.0 months), 368 patients died, and the 5-year OS rate was 67.5%. All available clinical variables, including the clinicopathological features and serum biomarkers in Table 1, were subjected to stepwise Cox regression and LASSO Cox regression. As shown in Table 2, the multivariate analysis by conventional Cox regression showed that OS correlated with age, treatment modalities, single large (>5 cm) HCC (SLHCC), serum creatinine levels, fibrosis-4 (FIB-4), lymphocyte-to-monocyte ratio (LMR), albumin-bilirubin (ALBI) grade, and alpha-fetoprotein (AFP) levels. The risk factors associated with OS by the LASSO Cox regression included the above eight factors and two additional ones: aspartate aminotransferase (AST) and prognostic nutritional index (PNI) (Fig. 2).
Development of the risk scores
The β-coefficients in the Cox regression for each selected factor were simplified based on their ratios, resulting in a user-friendly and clinically applicable score named the CATS-IF score (abbreviated from the contributing factors, as outlined in Table 3). Simultaneously, variables demonstrating significance in the ML-based LASSO Cox regression were identified to develop our ML-based risk score. The β-coefficients in the multivariate Cox regression for each selected factor in LASSO COX regression were similarly simplified based on their ratios. The coefficients between these two models were compared and the results showed high consistency (Supplementary Fig. 1). Both the Cox-based CATS-IF score and ML-based CATS-INF score exhibited good predictive capability for OS in the training cohort, with respective AUC values of 0.723 and 0.729. To stratify patients into risk groups, we sorted them based on their CATS-IF score and CATS-INF score, establishing cutoffs for high, intermediate, and low-risk groups at the 33rd and 67th percentiles of the patients’scores. The Kaplan–Meier plots depicting the survival outcomes in the training cohort are presented in Figure 3A and Figure 3B for the CATS-IF score and CATS-INF score, respectively.
Calibration, validation and performance of the Cox-based CATS-IF and ML-based CATS-INF score
To evaluate the predictive efficacy of both the Cox-based CATS-IF score and ML-based CATS-INF score for OS, we calculated these scores in the validation cohort and stratified patients into high, intermediate, and low risk groups. In the validation cohort, both the CATS-IF score and CATS-INF score exhibited well-predictive capabilities for OS (AUC=0.695 and 0.707, respectively) and accurately stratified patients into low, intermediate, and high-risk groups. The 5-year OS rates in these groups stratified by CATS-IF were 81.8%, 62.8%, and 43.3%, while groups stratified by CATS-INF were 83.5%, 62.0%, and 42.9 %, respectively (Fig. 3C and 3D, P<0.001).
We subsequently generated calibration plots by plotting observed and predicted probabilities,stratified by 10 percentiles of the predicted probability from both CATS-IF score and CATS-INF score (Supplementary Fig. 2A and 2B). The calibration plot matched well with the ideal 45-degree line and showed good correlation between predicted and observed outcomes. We then compared the ability of the modern prognostic scores for HCC. As shown in Table 4, the CATS-INF score and CATS-IF scores had better predictive capability when compared to the ALBIscore, AST-to-platelet ratio index (APRI), LMR, PNI, FIB-4, and the model for end-stage liver disease (MELD) score. Notably, the ML-based CATS-INF score exhibited the lowest AIC value.
The time-dependent ROC curves were generated for the Cox-based CATS-IF and ML-based CATS-INF scores (Fig. 4). While the two scores had similar performance in predicting OS within the first year after diagnosis (AUROC 0.672 vs. 0.672), the ML-based CATS-INF score exhibited enhanced predictive accuracy during the second, third, and fifth years post-diagnosis compared to the conventional Cox-based CAT-IF score (AUROC 0.722 vs. 0.712, 0.712 vs. 0.702, and 0.704 vs. 0.690, respectively).
DISCUSSION
In this study, we showed that age, SLHCC, serum creatinine and AFP levels, FIB-4, LMR, ALBI grade, and treatment modalities are crucial risk factors for determining OS in patients with HCC in BCLC stage 0 or A. Both the CATS-INF and CATS-IF scores had excellent ability in prognosis prediction and risk stratification of the patients. It was validated by the validation cohort, and the calibration plot was well matched. Moreover, it outperformed currently available prognostic scores. Hence, the novel CATS-INF and CATS-IF scores could provide individual risk identification for patients with early-stage HCC and could benefit patients with HCC by allowing for more individualized medical services and treatment plans.
The factors that determine the prognoses of patients with HCC include tumor factors, field factors in the background liver (including the grade of inflammation and steatosis and the stage of fibrosis), and treatment factors [21,22]. For patients with early-stage HCC, the impact of tumor factors might be less critical, while the field factors play a more important role in determining the outcomes of patients [22-24].
Studies have shown that systemic inflammation can promote cancer growth, invasion, and metastasis in patients with malignant tumors [25,26]. Leukocytes play an important role in the immune response. Lymphocytes participate in cytotoxic cell death and inhibition of tumor-cell proliferation and migration [27,28]. Conversely, monocytes can promote tumor progression and metastasis [29]. As a result, the LMR has been used as a serum biomarker to predict the prognosis of several cancers and has shown good predictive capability of prognosis for patients with HCC in previous studies [30,31]. Our research revealed that low LMR is related to poorer OS in patients with early-stage HCC, indicating that inflammation was an important factor in the outcomes of patients.
Besides inflammation status, fibrosis also plays a crucial role in hepatic carcinogenesis as well as deteriorating liver function [32]. The FIB-4 score combines standard biochemical values(platelets, alanine aminotransferase, and AST) and age and has been recognized as an accurate, convenient, and non-invasive serum marker for evaluating the status of liver fibrosis for patients with ACLD [33]. We demonstrated that patients with high FIB-4 scores had poor OS compared to their counterparts, suggesting that fibrosis was also an important prognostic factor [32,34,35].
Moreover, the significance of nutritional status in predicting the outcomes of patients has been explored in diverse malignancies, including HCC [11,36,37]. It was postulated that a better nutritional function indicated a better body reserve against disease burden and could promote the immune response to combat malignancies [38,39]. By incorporating the PNI, a well-validated marker for assessing a patient’s nutritional status, into the risk scores, we could gain additional insights into their nutritional well-being. This inclusion enabled a more comprehensive understanding of a patient’s prognosis, considering the role of nutritional status as a contributing factor to OS.
The ALBI score has been widely validated as a reliable tool for evaluating liver functional reserve, as well as predicting the prognosis of patients with HCC [11,19,40,41]. Our study also confirmed that the ALBI grade was an independent factor in the outcomes of patients with early-stage HCC. Taken together, our results revealed the importance of inflammatory status, fibrosis status, and liver functional reserve in the survival of patients with early-stage HCC. The prognostic model that we developed based on the results could serve as a convenient, accessible, and economical way to predict patients’ outcomes and facilitate management of patients individually.
The categorization of SLHCC as BCLC stage A or B has been controversial in some studies, especially in Asia [42-44]. In the 2022 updated BCLC strategy for HCC, SLHCC was classified as BCLC stage A [8]. However, our previous study showed that patients who had SLHCC had a 5-year OS rate of 42.6%, which showed significant differences compared to those in BCLC stage A (57.0%) and stage B (27.3%) HCC [9]. Therefore, we proposed that SLHCC might be a distinctive stage between A and B [9]. In the current study, SLHCC was an independent risk factor associated with poorer OS among patients with HCC and BCLC stage 0-A and was a component of the CATS-IF score and CATS-INF score. Consequently, it is suggested that patients with SLHCC should be closely followed up or given adjuvant systemic therapy because they have a higher risk of mortality. More prospective studies are warranted to elucidate this issue.
For patients with HCC in BCLC stage 0-A, curative treatment modalities are recommended as front-line therapy [8]. In our study, around 90% of patients received curative treatments. Moreover, non-curative treatment was an independent risk factor associated with poorer OS. This indicates that curative treatment modalities should be performed for patients with early-stage HCC if there are no contraindications.
There were some disparities in the significance of our variables when comparing the conventional Cox regression model with the ML-based LASSO Cox regression model. Notably, serum AST levels and PNI were independent risk factors for OS in the ML model but not in the conventional model [45]. Cox logistic regression has traditionally been a stalwart in survival analysis due to its accuracy in identifying factors influencing survival. However, its precision can be compromised when variables interact with each other [46]. In our study, both the conventional Cox regression model and ML-based methods demonstrated superior performance in predicting the prognoses of early-stage HCC patients compared to other current prognostic scores. It is noteworthy that the ML-based CATS-INF score, with its lowest AIC value, particularly excelled in long-term prognostication (Fig. 4). The results highlighted the impact of nutritional status on the outcomes of patients with early-stage HCC. Furthermore, they validated the potential of ML-based methods, such as LASSO Cox regression, as valuable complements to traditional analytic approaches and promising tools for future model development.
Despite the good performance of the CATS-IF score and CATS-INF score, there are still several limitations that need to be addressed. First, this study was a single-center retrospective cohort study, and further validation with prospective cohorts will be needed to confirm the predictive capability for prognosis. Second, the model was developed based on a limited database, so further research on possible prognostic factors and biomarkers will be required to predict the outcome of patients with HCC. Third, most of the patients in our study cohort had viral HCC, and further study is needed to determine whether etiologies of HCC interfere with the result. Fourth, microvascular invasion (MVI) is a critical factor to determine OS and recurrence for patients with early-stage HCC who underwent surgical resection [47,48]. However, it is important to note that the diagnosis of MVI relies on pathological examination. In our study, approximately 40% of the patients received non-surgical treatments, which limited the availability of detailed pathological specimens necessary for determining MVI status. Consequently, we were unable to adequately incorporate MVI into our analysis. Fifth, in our study, the treatment modality emerged as a significant parameter in determining the OS of patients with early-stage HCC. The results were consistent with previous studies [9,10,49]. However, the proportion of patients receiving non-curative treatment modalities was relatively small, potentially impacting the significance of our analysis. Lastly, some of the biomarkers involved in this study still lack universal acknowledgement of cutoff values and clinical applicability. Hence, further studies with larger scale and more detailed information are needed.
The CATS-IF score developed by conventional Cox regression and the CATS-INF score developed by ML-based methods both showed excellent prognostic ability in early HCC patients, while the ML-based CATS-INF score showed slightly superior performance, especially in the long-term follow-up.
Notes
Authors’ contribution
Tan ECH and Su CW had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: Tan ECH, Lee PC, Su CW. Acquisition, analysis, or interpretation of data: All authors. Drafting of the manuscript: Ho CT, Tan ECH, Su CW. Critical review of the manuscript for important intellectual content: Chu CJ, Huang YH, Huo TI, Hou MC, Wu JC. Statistical analysis: Ho CT, Tan ECH, Su YH. Administrative, technical, or material support: Ni. Supervision: Tan ECH, Su CW.
All authors approved the final version of the article, including the authorship list.
Conflicts of Interest
There are no potential conflicts of financial and non-financial interests in the study. Chien-Wei Su: Speakers’ bureau: Gilead Sciences, Bristol-Myers Squibb, AbbVie, Bayer, and Roche. Advisory arrangements: Gilead Sciences. Grants: Bristol-Myers Squibb and Eiger.
Acknowledgements
This work was supported by grants from the National Science and Technology Council of Taiwan (MOST 111-2314-B-075-056, NSTC 112-2314-B-075-043-MY2), Taipei Veterans General Hospital (V112C-039, Center of Excellence for Cancer Research MOHW112-TDU-B-221-124007, and Big Data Center), Y.L. Lin Hung Tai Education Foundation. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
A part of this study was presented as a poster exhibition at the annual meeting of the annual meeting of the American Association for the Study of the Liver, Boston, USA, November 10-14, 2023. It has been identified as a “Poster of Distinction.”
Writing assistance: American Manuscript Editors.
Abbreviations
ML
machine learning
HCC
hepatocellular carcinoma
BCLC
Barcelona Clinic Liver Cancer
OS
overall survival
ACLDs
advanced chronic liver diseases
HBV
hepatitis B virus
HCV
hepatitis C virus
MASLD
metabolic dysfunction-associated steatotic liver disease
AASLD
American Association for the Study of Liver Disease
TACE
transarterial chemoembolization
FIB-4
fibrosis-4
LMR
lymphocyte-to-monocyte ratio
ALBI
albumin-bilirubin
AFP
alpha-fetoprotein
AST
aspartate aminotransferase
PNI
prognostic nutritional index
APRI
AST-to-platelet ratio index
MELD
model for end-stage liver disease
MVI
microvascular invasion
SUPPLEMENTAL MATERIAL
Supplementary material is available at Clinical and Molecular Hepatology website (http://www.e-cmh.org).
References
Article information Continued
Notes
Study Highlights
• Question: Does a ML-based approach exhibit superior performance in predicting the outcomes of patients with earlystage HCC compared to the conventional Cox proportional hazards model?
• Findings: In this cohort study of 1,411 patients with early-stage HCC, both the Cox-based CATS-IF score and ML-based CATS-INF score demonstrated superior predictive performance for the overall survival compared to other prognostic scores. Notably, the CATS-INF score exhibited the lowest Akaike information criterion value.
• Meaning: These findings suggest that the ML-based CATS-INF score could stratify patients with early-stage HCC into distinct prognostic groups.