INTRODUCTION
Hepatocellular carcinoma (HCC) is the most common primary cancer of the liver and the fifth most common cancer globally [
1]. One of the major etiologic causes of HCC development is chronic infection with hepatitis B virus (HBV), especially in Asia and Africa. HBV-induced hepatocarcinogenesis is a complex and multifactorial process. This virus can produce onco-proteins such as HBV X protein (HBx), HBV large surface protein (L-HBs), or middle-sized HBV surface protein (MHBst) and drive the integration of HBV deoxyribonucleic acid (DNA) sequences into the human genome. These are involved in hepatocarcinogenesis as a direct oncogenic potential. Additionally, host immune-mediated cytolysis against HBV during long-lasting host-virus interaction ultimately results in liver fibrosis and cirrhosis, which can indirectly cause HCC.
HBV genomic integrations have been reported in over 80–90% of HBV-associated HCCs. HCC cells may contain single or multiple discrete HBV integrants within the human genome, which generate various genetic alterations, including deletions, translocations, fusion transcripts, and global genomic instability [
2,
3]. These alterations result in the selection of non-cancerous hepatocyte clones with survival advantages. Within this context, neoplastic clones with deregulated cell proliferation and suppressed apoptosis can ultimately arise and expand, resulting in overt HCCs [
2,
3].
Studies have shown HBV integration events even in hepatitis B surface antigen (HBsAg)-negative HCC patients with anti-HCV antibody, and alcohol-related or cryptogenic liver disease, indicating its potential to induce liver cancer, especially in patients with occult HBV infection (OBI) [
4,
5]. Though rare, some chronic HBV carriers experience HBsAg seroclearance during the course of chronic hepatitis B (CHB), with a rate of approximately 1% per year [
6]. Animal studies revealed that integrated forms of HBV still persist after HBsAg clearance [
7,
8]. It is unknown whether these persistent integrated HBV forms in the liver contribute to HCC risk. The clinical observations of an association between OBI and an increased risk of progressive liver disease or HCC suggest that the intrahepatic persistence of integrated forms of HBV may act as targets for low-level ongoing antiviral immune attacks, leading to disease progression [
8,
9]. Unfortunately, there is a lack of clinical evidence to characterize viral integrants and evaluate their role in HCC development following HBsAg seroclearance in chronic HBV carriers.
Advances in massive parallel sequencing technology have helped to overcome the technical limitations on earlier research methods for HBV integration, such as Southern blots or polymerase chain reaction (PCR)-based methods, and enabled highly efficient genome-wide detection of HBV integration. In this study, we performed a next generation sequencing (NGS)-based, high-throughput capture assay for HBV integration utilizing Korean genotype-specific probes. The aim of our study was to investigate the frequency and patterns of HBV integration, as well as characterize integration events in CHB patients developing HCC after HBsAg seroclearance.
MATERIALS AND METHODS
Tissue samples
Between December 2011 and December 2018, chronic HBV carriers with newly-diagnosed HCC at the Catholic University of Korea, Seoul, South Korea, were screened for inclusion in the study. Among them, seven patients who developed HCC after HBsAg loss and had available stored liver tissue were included in the analysis. The diagnosis of HCC was made based on typical imaging analyses, consistent with the criteria of Korean National Cancer Center guideline [
1] and finally confirmed pathologically. The liver tissue samples were unaffected by hemangioma or other benign tumor lesions and were immediately frozen in liquid nitrogen and then stored at –80°C. The diagnosis of liver cirrhosis was made by pathological examination or clinical evidence of cirrhosis, including nodularity or splenomegaly on liver imaging and/or thrombocytopenia [
10]. This study was approved by the Ethics Committee of The Catholic University of Korea (KC16TISI0436), and written informed consent was obtained from all patients.
HBV probe design
Intrahepatic HBV integration in liver tissues was identified by NGS-based high-throughput targeted sequencing, which was modified based on HBV probe-based capture technology [
11]. The HBV hybridization probes were designed to tile, based on eight Korean full HBV genome sequences (GenBank Accession numbers AY641559.1, DQ683578.1, GQ872211.1, GQ872210.1, JN315779.1, KR184660.1, AB014381.1, AB014395.1, and D23680.1 Hepatitis B virus complete genome sequence;
https://www.ncbi.nlm.nih.gov/nuccore). The probes were prepared to capture the HBV inserted fragments in the host genome and these fragments were further sequenced on an Illumina HiSeq 2500 platform. The number of total probes was 215, and the probe group size was 25.595 kbp. With these designed probes, 100% sequence coverage was achieved for all given HBV sequences. The workflow for detecting HBV integration is shown in
Supplementary Figure 1.
HBV-integrated fragment enrichment and capture sequencing
The HBV probe capture assay using the Illumina NGS workflow was conducted as follows. Briefly, 1 ug of input genomic DNA was fragmented to 150–200 bp by adaptive focused acoustic technology (AFA; Covaris, Woburn, MA, USA). The fragmented DNA was repaired; an ‘A’ was ligated to the 3’ end and then Agilent adaptors ligated to the fragments. After a ligation assessment, the adaptor-ligated products were PCR-amplified. For HBV viral capture, 250 ng of DNA library was used according to the standard Agilent SureSelect Target Enrichment protocol. Hybridization to the capture baits was performed at 65°C using the heated PCR cycler lid option at 105°C for 24 hours. After amplification of the captured DNA, the final purified product was quantified according to the manufacturer’s instructions (qPCR quantification protocol guide) and qualified using the TapeStation DNA Screen Tape D1000 (Agilent, Santa Clara, CA, USA). Finally, paired-end 100-bp read-length sequencing of the purified captured DNAs was carried out by Illumina HiSeq 2500 (Illumina, San Diego, CA, USA) following the manufacturer’s instructions.
Identification of the HBV-human chimeric reads
We generated a modified reference by merging the human (UCSC assembly hg19, original GRCh37 from National Center for Biotechnology Information, February 2009) and HBV (DQ683578.1) genome. Then the paired-end reads were mapped to the reference by BWA-MEM (bwa-0.7.12). Duplicates were removed by Picard (Picard-tools-1.130). The final bam file was sorted according to chromosomal coordinates. After mapping the paired-end sequences to the human reference and HBV reference, the chimeric reads were extracted using an in-house script, and breakpoints were predicted from the chimeric reads that aligned both to the human and virus genome. The HBV sequences of chimeric reads that matched the virus genome reference by 30 bp or longer were considered to be integrated HBV sequences. To minimize errors in the experimental procedure and remove the noise signals, we utilized the mapping quality (MQ) and read counts of the host-virus chimeric DNA fragments for HBV integration breakpoint calling. For quality of phred score, MQ cut-off values of 10, 20, 30, and 40 indicates 10%, 1.0%, 0.1%, and 0.01% probabilities of incorrect base calls, respectively (
Supplementary Table 1). HBV breakpoints with chimeric read counts of ≥2 and an average MQ of ≥20 were defined as true signals.
PCR-based Sanger sequencing for validation
PCR and Sanger sequencing were done to validate the integrated HBV sequences in selected HBV-human junction breakpoints at the telomerase reverse transcriptase (TERT) and mixed-lineage leukemia 4 (MLL4) genes. Sequencing primers were designed based on the paired-end reads, with one primer located in the human genome and the other in the HBV genome. The PCR conditions with the primer sequences for a representative case are shown in
Supplementary Tables 2 and
3.
Statistics
Data were expressed as median (range) or mean±standard deviation. The proportions and number of HBV integration breakpoints were represented in the bar or pie charts. Statistical analyses were carried out to compare the values between two groups. Student’s t or Mann-Whitney U tests were used for continuous variables, and the chi-square test or Fisher’s exact test was used for categorical variables. A P value less than 0.05 was considered statistically significant. Analyses were performed using SPSS ver. 21.0 (SPSS, Inc.. IBM Company, Chicago, IL, USA).
DISCUSSION
Study results of HBV integration may vary depending on the reference HBV genome used to capture the HBV sequences. Many previous studies of integration have included heterogeneous subjects in terms of HBV genotypes. In this regard, Korean patients are likely the best group to study HBV genome as genotype C is the universal type, accounting for almost 100% of Korean CHB patients [
12]. Southern blot hybridization, an initial method for detecting integrated DNAs, was shown to have low sensitivity and only detected clones undergoing extensive positive selection. PCR-based methods are significantly biased toward Alu or other specific sequences. NGS-based technologies currently emerged for this purpose. However, whole-exome sequencing captures only coding regions, while robust detection by ribonucleic acid (RNA) sequencing is limited to HBV DNA integrations occurring in transcriptionally active sites or large cellular clones [
3]. Whole-genome sequencing (WGS) also has some drawbacks including low depth and high cost, necessitating more efficient and cost-effective ways to identify viral integrations [
3]. Recently, we developed a probe set expected to cover almost all of the putative genotypes of Korean HBV, as our probes were designed based on the eight full-length HBV genome sequences from Korean patients with various stages of liver disease. Moreover, our method permits ultra-high depth sequencing and unbiased HBV capturing at about one-fourth of the cost of WGS. Taking advantage of its utility, we were able to localize multiple HBV integration sites throughout the genome of even patients with HBsAg seroclearance.
The current study demonstrated that HBV integration into the host genome is still present either in tumor or non-tumorous tissues in most of the patients who lost HBsAg. This prevalence appeared to be clearly higher than that of HBsAg-negative HCC patients with different etiologies [
4,
5] and almost comparable to the reported data using NGS-based methods for those with overt HBV infections [
3,
13,
14]. Importantly, our findings confirm the results of animal studies in clinical settings [
7,
8], which reported viral persistence as integrated forms after the resolution of HBV infection.
It is unclear whether these persistent HBV integrants contribute to liver cancer risk in functionally-cured patients with HBsAg seroclearance. The information has been lacking due to the widespread rarity of HBsAg seroclearance and the more extreme difficulties in obtaining tissues from such patients in order to study integration. The importance of our results lies in that we revealed a high prevalence of HBV integrations in the liver of patients who lost HBsAg. Moreover, the common site of HBV integration was the TERT promoter, in which a mutation was recently suggested as a gate-keeper driver in HCC development [
15]. Overall, the study results demonstrated that the genomic locations of integration, as well as HBV inserts, often appeared to be enriched within the regions relevant to hepatocarcinogenesis. These findings may provide insight into the potential causal relationship between the intrahepatic persistence of HBV integrants and HCC in HBsAg seroclearance settings.
It is noteworthy that there were distinct integration features between the tumors and non-tumors. We detected more frequent overall integration, but less frequent genic HBV integration in the non-tumors rather than in the tumors. In addition, many tumoral integrations were located in the vicinity of regulatory or cancer-associated genes including TERT, which involve immortalization, cell signaling, development, and transcriptional regulation (
Table 3). Viral breakpoints were also frequently found within or near the PreS/S gene, with some integrations in the X genes. The overall findings indicate that an HBV integration event in HCC with HBsAg seroclearance is absolutely comparable to that of HCC with overt HBV infection in terms of its genomic locations and viral inserts, as previously reported [
2,
13,
16].
HBV integration reportedly occurs randomly throughout the human genome [
2,
3]. However, recent high-throughput data consistently demonstrated that HBV integration in HCCs was not random, with several specific, highly affected sites, such as TERT, MLL4, and CCNE2 [
3,
13,
16]. Our analysis also showed HBV-TERT integration in our patient samples. Indeed, we observed biased, uneven chromosomal and genomic distributions of the HBV breakpoints, of which tumoral integrations were highly enriched in cancer-associated genes. As with HBV-related HCCs, the HBV integration sites in HCCs following HBsAg loss may reflect selection of integrations that confer a growth advantage to clonal populations of cells during liver carcinogenesis [
2].
Our results emphasize HCC surveillance in chronic carriers achieving seroclearance. In our data, patients with HBsAg loss who did not remain under regular surveillance eventually presented with a large HCC (
Table 1). A Korean study has shown that the estimated annual incidence rates of HCC following seroclerance were 2.85% and 0.29% in cirrhotic and non-cirrhotic patients, respectively [
17]. The regional guidelines recommend offering surveillance when the risk of HCC exceeds 1.5%/year in cirrhotic patients and 0.2%/year in non-cirrhotic CHB patients [
1,
18]. Our analysis herein also showed the persistence of HBV integration within cancer-associated genes as well as HBV PreS/S subgenomic integrants in HCC patients achieving seroclearance. Thus, these findings strongly support ongoing HCC surveillance in patients even after HBsAg seroclearance.
Interestingly, we observed a higher number of integration breakpoints in HCC patients with detectable HBV DNA than in those with undetectable HBV DNA (
Fig. 1C). The linear correlation between HBV breakpoints and HBV viremia status or OBI levels remains to be further confirmed. It would also be interesting to investigate the ‘direct’ transforming effects of HBV integration in the absence of cirrhosis, as previously described [
4,
13]. The study subjects were all male. Besides HBV effects, gender or environmental/behavioral effects cannot be excluded in such patients. Since this study had only small sample sizes, it was not feasible to provide a detailed evaluation of such open questions. Nevertheless, our observation of HBV integrations enriched in cancer-related pathways, HBV PreS/S or X integrants, as well as cirrhotic patients with viremia, indicates that the direct and indirect pathways are not mutually exclusive but both may induce hepatocarcinogenesis in these settings.
In conclusion, our study clearly demonstrated the persistence of intrahepatic HBV DNA integration in HCC developing after HBsAg seroclearance. Using the Korean genotype-specific probe capture assay followed by NGS technology, we identified more frequent genic integrations in HCC with preferential sites toward cancer-associated genes, such as the TERT promoter, as well as subgenomic HBV fragments of regulatory elements, such as the PreS/S and X genes. These findings suggest that the biological functions of HBV integration would be absolutely comparable between HBsAg-positive and HBsAg-serocleared HCC and pro-oncogenic effects of HBV integration may continue and ultimately lead to HCC in a subset of patients achieving HBsAg seroclearance. Thus, ongoing HCC surveillance and clinical management should be continued even after HBsAg seroclearance.