Identification of Novel Drug Targets and Antigenic Proteins in Helicobacter Hepaticus through Proteome-Mediated Mining

of Drug Targets and Proteins Abstract Helicobacter hepaticus is a known pathobiont that causes hepatocellular carcinoma and intestinal cancer in mice. In recent studies, Helicobacter hepaticus in human patients with gallbladder cancer, biliary tract, and liver diseases has been reported. In this study, we aimed to identify therapeutic targets and vaccine candidates against the pathobiont via computational biology. The core proteome of Helicobacter hepaticus ATCC 51449 was retrieved from UNIPROT and analyzed to remove paralogs and duplicates. After identifying human non-homologous proteins and predicting subcellular localization, cytoplasmic proteins were subjected to essentiality, pathway, and druggability analysis. The analysis of essential cytoplasmic proteins resulted in six druggable targets having vital roles in peptidoglycan and lipopolysaccharide biosynthesis. For the identification of vaccine candidates, outer membrane and extracellular proteins were analyzed. After determining antigenic proteins, T and B-cell epitope prediction was carried out to uncover common epitopes for the candidates. Vaccine candidate identification revealed eleven antigenic proteins, five of which had overlapped T and B-cell epitopes that can elicit both humoral and cell-mediated immune response. Identified druggable targets and vaccine candidates might be used to develop successful treatment of infections caused by Helicobacter hepaticus.


Introduction
Increasing evidence has revealed the relationship between dysregulated microbiota-host interactions and many diseases such as cancer, metabolic syndrome, and inflammatory bowel diseases. Besides, specific bacterial species in microbiota can play important roles in specific clinical outcomes like gastric cancer [1]. Many studies have proven that chronic inflammation or infection can lead to neoplasms, and chronic inflammation of gastric mucosa caused by Helicobacter pylori is a well-known example that has a positive correlation with gastric carcinomas [2]. After the identification of H. pylori as the first pathogen classified as 'group 1 carcinogen' by the World Health Organization (WHO) in 1994, more studies have focused on Helicobacter species to identify their roles, especially in the liver and gastric diseases. In 1994, a study conducted with mice with multifocal necrotic hepatitis paved the way for the discovering a new Helicobacter spp. called H. hepaticus [3].
H. hepaticus is a natural inhabitant of mouse microbiota, and a persistent infection caused by this pathobiont leads to intestinal cancer, hepatocellular carcinoma, and chronic hepatitis in susceptible mice [4]. Although it was first considered as a mouse-specific pathogen, many studies have been reported for to hepatocellular carcinoma development [9]. Moreover, a metaanalysis revealed a higher presence of Helicobacter spp. including H. hepaticus in patients with biliary tract cancers [10]. Although a significant correlation between H. hepaticus and liver or biliary tract diseases has been not shown yet, it is a potential zoonotic pathogen and is still worthy of investigating whether chronic infection of H. hepaticus can be a risk factor for these diseases.
In the traditional approach, drug targets or vaccine candidates' discovery for the control of pathogen-based diseases is generally based on protein identification through isolation and characterization of microorganisms. This approach has some crucial challenges, including challenging culturing conditions and a longer duration for exploration of proteins. The recent advancements in bioinformatics and computational biology opened the way for in silico identification of novel vaccine candidates and therapeutic targets faster and more cost-effectively. In reverse vaccinology, first described by Rappuoli in 2000, a pipeline has been used to reveal novel antigenic proteins by mining genomic or proteomic data [11]. Moreover, data mining can explore novel drug targets through a sequence or structural-based comparison of proteins with known targets.
In silico approach carried out by the genome or proteome of various pathogens including H. pylori were provided the identification of useful drug or vaccine targets [12]. In the current study, a proteome-mediated mining approach was used to reveal novel antigenic proteins and drug targets in H. hepaticus, which can be valuable as potential preventive and prophylactic agents to control the diseases in relation.

Methodology Data retrieval and removal of paralogous sequences
The entire proteome of H. hepaticus ATCC 51449 was retrieved from the UNIPROT database and subjected to CD-HIT analysis to remove paralogous or duplicate proteins [13]. Sequence identity cut-off was set at 0.8 (80% identity) to exclude redundant sequences, and proteins with less than 100 amino acids were not also included. Selected non-paralogous protein sequences were further analyzed.

Prediction of subcellular localization
PSORTb v.3.0 (https://www.psort.org/psortb/) and CELLO v.2.5 (http://cello.life.nctu.edu.tw/) were used to predict the location of non-homologous proteins [14,15]. While outer membrane and extracellular proteins were considered for vaccine candidate analysis, cytoplasmic proteins were curated through the DrugBank database to reveal potential drug targets specific to the pathogen after essentiality analysis.

Identification of essential non-homologous proteins of H. hepaticus
Cytoplasmic proteins were considered as possible drug targets and subjected to BLASTp analysis through Database of Essential Genes (DEG) (http://www.essentialgene.org/) for the identification of indispensable proteins having roles in primary cellular functions of the pathogen [16]. The parameters, including bit score > 100 and percent identity ≥ 30%, were applied. and proteins listed in the pathways specific to the pathogen were considered for further analysis.

Druggability screening of essential cytoplasmic proteins
Essential, pathogen-specific cytoplasmic proteins were assessed by BLASTp against the Drugbank database using default parameters with a bit score > 100 and e-value < 0.001 to identify the novelty of proteins as targets. 3D structure modeling of the proteins was carried out in SWISS-MODEL and Phyre2.
3D structure modeling of the proteins was carried out in SWISS-MODEL and Phyre2 protein fold recognition server (http://www. sbg.bio.ic.ac.uk/~phyre2) [17]. Then, they were subjected to the PockDrug analysis for pocket druggability investigation (http:// pockdrug.rpbs.univ-paris-diderot.fr/) [18]. Also, protein-protein interaction analysis was performed in the STRING v11.0 database (http://string-db.org) to identify the hub proteins among putative targets according to the node degree (K ≥ 5) which represents the significant number of direct and indirect associations. Molecular weight estimation was performed by the Expasy ProtParam tool.

Vaxign analysis
Vaxign database (http://www.violinet.org/vaxign2) was used to analyze outer membrane and extracellular proteins to identify adhesin probability and Vaxign-ML score [19]. The inclusion criteria were as follows: adhesion probability >0.51, the number of transmembrane helices ≤ 1, and no similarity to mouse or human proteins. Besides, the proteins with Vaxign-ML score ≥ 90 were further selected.

Prediction of putative antigenic and virulent proteins
The Virulence Factor Database (VFDB) (http://www.mgc.ac.cn/ VFs/) and VaxiJen v2.0 (http://www.ddg-pharmfac.net/vaxijen/ VaxiJen/VaxiJen.html) were used to predict virulent and antigenic properties of the selected outer membrane and extracellular proteins, respectively [20,21]. While the proteins were screened based on BLAST score ≥ 80 in VFDB, a threshold value of ≥ 0.5 was applied in the VaxiJen search. Molecular weight and theoretical pI values were estimated through the Expasy ProtParam tool.

Prediction of T-cell epitopes
The selected antigens were evaluated to find out potent T-cell epitopes via Immune Epitope Database (IEDB) server (http://tools. binding capacity, respectively [22]. In the study, the cut-off value was determined as IC50 < 50 nM. Parallelly, T-cell MHC-I binding epitopes were screened for their immunogenic properties (http:// tools.iedb.org/immunogenicity/), and epitopes with positive immunogenicity values were picked for further analysis. MHC-II binding prediction was performed using the SMM-align (NetMHCII 1.1) method covering human HLA-DR locus (http://tools.iedb.org/ mhcii/). Peptide length was defined as 15 (default value proposed by the server). Peptides were sorted by predicted IC50 value and peptides with IC50 < 50nM were figured out for the next analysis.
The residues with scores above 0.5 were predicted as B-cell epitopes.

Structural modeling
The 3D structures of the antigenic candidates were modeled via SWISS-MODEL and Phyre2 web-server with default settings.
Once the models were created, the Protein Preparation Wizard of Schrödinger package was employed to refine the structures.
In the first step of the refınement, the models' bond orders were assigned by using the Chemical Component Dictionary (CCD), missing hydrogens added, and het states determined by using the Epik tool of Schrödinger for pH 7 ± 2 [24]. Following, PROPKA (another package of Schrödinger) was employed for hydrogen bond assignment (at pH 7.0). In the second step of the refinement, the water molecules beyond 3 A from het groups were discarded from structures. Finally, energy minimization performed using the OPLS3e force field for each created structure [25].
Once refinement was completed, predicted epitopes were visualized on the structures via Maestro v12.4 to confirm that they are surface-exposed.

Exclusion of paralogous and human homologous proteins
A large quantity of redundant sequences is present in the bacterial proteome mainly due to evolutionary duplications. Thus, a total of 1873 proteins of H. hepaticus ATCC 51449 was subjected to CD-HIT analysis to identify redundant sequences and proteins with < 100 amino acids. Two hundred fifty-five proteins were excluded after analysis, leaving 1618 proteins to be investigated. NCBI BLASTp analysis was carried out with non-paralogous proteins against human proteome to identify human homologs. Proteins having homology with the host proteins were excluded, and a final set of 1168 proteins were considered for further analysis ( Table 1 in Supplemental Data).

Prediction of subcellular localization
The pool of non-homologous proteins was subjected to subcellular localization prediction using the PSORTb database.
When a protein location was assigned as 'unknown', CELLO v.  were found in the cytoplasmic region, and the next more significant fraction (17%) localized in the inner membrane. While 9% and 4% were outer membrane and extracellular proteins, respectively, periplasmic proteins were 7% (Table 1 in Supplemental Data).
Cytoplasmic, outer membrane, and extracellular proteins were subjected to further analyses since cytoplasmic proteins can serve as promising drug targets based on their pivotal roles in cell survival. In contrast, outer membrane and extracellular proteins can be considered as vaccine candidates due to their more exposed nature than cytoplasmic ones.

Identification of druggable cytoplasmic target proteins
The cytoplasmic proteins essentiality was analyzed in DEG 10 since they were considered as the most promising drug targets, and a total of 281 essential cytoplasmic proteins carrying out important functions in survival, adhesion, and infection were identified ( Table   2 in Supplemental Data).
After retrieval of H. hepaticus and human metabolic pathways from the KEGG database, they were manually compared to reveal the pathogen-specific ones, which then resulted in 25 metabolic pathways. Analysis of 281 essential cytoplasmic proteins revealed 36 proteins having roles in these specific pathways ( Table 3 in Supplemental Data) ( Figure 2). Moreover, 14 proteins were assigned to peptidoglycan and lipopolysaccharides (LPS) biosynthesis pathways, which play essential roles in constructing cell walls of Gram-negative bacteria (Table 1). Therefore, to check the druggability of these fourteen proteins, they were analyzed through BLASTp search in the DrugBank database. Six of fourteen proteins matched with FDA-approved or experimental drug targets in DrugBank (Table 2) [26].  After modeling and validating the protein structures, PockDrug analysis revealed that all six targets had druggable regions that a drug-like molecule can bind ( Table 2). Following that, interactome analyses were performed for these six proteins to understand whether they are hub proteins representing a significant degree via interacting with many other nodes in protein networks.
Interactome analyses were performed using the STRING database, and all six proteins seemed to have the node degree greater than 5 (    (Table 3).

Prediction of potent T-cell epitopes of putative antigenic proteins
T-cell dependent cellular immune response is initiated following antigen presentation by Major Histocompatibility Complex (MHC) class I and class II. Therefore, it is inevitable to predict epitope sites of the prioritized 11 antigenic proteins for potential vaccine design. First of all, selected proteins were screened to find epitopes presenting specific binding capacity to MHC class I by IEDB server.
High binding affinity property was considered equivalent to IC50 < 50 nM, and matching epitopes were furtherly subjected to immunogenicity prediction and checked for their ability to provoke an immune response. Following double screening covering MHC I-binding and immunogenicity prediction, epitopes with IC50 < 50 nM value and positive immunogenicity score were listed in Table 5 in Supplemental Data. Secondly, candidate proteins were depicted for the specific binding capacity to MHC class II by the IEDB server (IC50 < 50 nM) ( Table 6 in Supplemental Data).

B-cell epitopes of putative antigenic proteins
In addition to T-cell epitope prediction, 11 proteins were examined for B-cell epitope prediction (humoral immunity). The Bepipred server was used to find linear epitopes, respecting 0.  Table   7 in Supplemental Data).

Final epitope selection and topological analysis of common epitopes
MHC Class I/II epitope prediction study and B-cell linear prediction were treated to find common epitopes. Q7VJK1, Phage_base_V domain-containing protein) were finally selected based on their common T and B-cell epitopes (Table 4). For the immune cells' effective recognition, exo-topology of the predicted common epitopes was investigated. During the homology modeling, which was performed via SWISS-MODEL, 4AIP, 6JZR, 6K31, and 5FP1 were employed as templates of Q7VF32, Q7VH86, Q7VJR0, and Q7VJ32, respectively. Here templates were given as Protein Databank IDs (PDB), and target proteins were represented with UniProt IDs. In the case of Q7VJK1, which has two identified epitope regions from 10th to 27th amino acids, Phyre2 was employed in intense mode with the default settings to benefit from the ab-initio method because the templates identified for this protein represent a low coverage for the target region where the epitopes are located. After the analysis, the surface-exposed nature of the epitopes was displayed with no folding into the protein structure ( Figure 4).

Discussion
In the last decades, computational approaches, coupled with bioinformatics tools, have provided insights to identify potential targets and antigens from different microorganisms. The bioinformatics-mediated analysis helps narrow down the number of proteins or genes to be investigated through a rational workflow.
This strategy provides advantages in terms of time and cost.
Besides discovering novel targets or antigens, in-silico approaches can also help design structure-based drugs, repurpose known drugs, construct multi-epitope vaccine proteins, and investigate host-pathogen interactions [27]. A computational approach termed reverse vaccinology has been used to mine the genome or proteome of microorganisms to identify novel vaccine candidates, and sequence or structure-based comparisons with known targets provide information to discover putative drug targets.
In this study, a proteome-mediated analysis was applied to uncover potential antigenic proteins and drug targets in H.
hepaticus, which is a member of the enterohepatic group of database, non-homologs were analyzed to identify the subcellular location, which is an important aspect since it helps to understand protein's function and to evaluate proteins as drug targets or vaccine candidates.
As antibiotic resistance became a significant threat to human health, alternative approaches rather than antibiotics began to be sought to fight against bacterial pathogens. One popular approach is to design small-molecule enzyme inhibitors for critical pathways of microorganisms such as LPS, peptidoglycan, and quorum sensing [28]. In addition to traditional methods, genomics or proteomicsbased in-silico strategies enable to search potential druggable candidates for these small-molecule inhibitors in a fast manner [29][30][31]. Therefore, the cytoplasmic proteins of H. hepaticus were analyzed to reveal the enzymes that have a role in vital metabolic pathways that can serve as promising drug targets.
DEG 10 contains more than 12,000 essential genes and their expressed proteins from 31 bacterial species [16]. Besides search with the gene name or function, the database also allows BLASTp search for query sequences against essential genes.
After retrieving essential gene-products among the cytoplasmic proteins through the DEG server, KEGG pathway analysis was carried out to reveal metabolic pathways unique to the pathogen.
These analyses resulted in 36 proteins specific to 25 metabolic pathways found only in H. hepaticus compared to the human.
Among them, LPS and peptidoglycan biosynthesis pathways have primary importance due to their critical roles, especially in pathogen survival, antibiotic sensitivity, and pathogenesis.
Besides, the disruption of these constituents would be quite useful to control pathogens by making them vulnerable against osmotic lysis [26]. Therefore, fourteen out of thirty-six proteins assigned to these pathways were prioritized and analyzed, resulting in six hub proteins (UDP-N-acetylglucosamine acyltransferase  into its contribution to fully mature cell wall structure, viability, and resistance to β-lactam antibiotics [32]. When considering all these key enzymes' functions, they seem to be valuable targets to control the pathogen due to the disruption of structural integrity and enabling antibiotics to accomplish their actions. The outer membrane and extracellular proteins usually play significant roles as virulence factors related to the pathogen's adhesion, invasion, and survival in the host. They are generally the first contact points with the host, which can elicit an immune response. Therefore, they can be considered as potential vaccine candidates, and Vaxign was used to analyze the outer membrane and extracellular proteins of H. hepaticus. Vaxign performs the topology analysis using HMMTOP based on a general hidden Markov model (HMM), and proteins having ≥ 2 transmembrane helices are not considered as ideal candidates due to the difficulty in the cloning and expression of proteins during vaccine studies [12,19]. Adhesin proteins are generally critical for pathogen invasion, and proteins with locations rather than cytoplasm and cytoplasmic membrane tend to have more adhesin probability (AP) [43]. Vaxign predicts AP using SPAAN with 89% sensitivity and 100% specificity. Moreover, Vaxign assigns a protegenicity score to each protein, indicating how well a candidate induces protective immunity [44]. After selecting the candidates based on  (Table 4). Moreover, the structural analysis demonstrated the surface-exposure topology of these epitopes, suggesting an efficient immune system recognition. Among these five proteins, FrpB, FlgG, and flagellar basal-body protein were previously characterized, whereas two candidates remain uncharacterized. The immunogenic nature of bacterial flagella has been widely known, and flagellar proteins including FlaA, FlaB, FlgE, and FlgG studied in different pathogens have been found as immunogenic [45][46][47][48]. Another characterized protein, FrpB, has been also identified as immunogenic in various pathogens such as Neisseria meningitidis and Salmonella typhi [49,50]. Besides these known antigens, Q7VJ32 (uncharacterized protein) and Q7VJK1 (phage base V domain-containing protein) could be considered as new antigenic determinants specific to H. hepaticus.

Conclusion
Although experimental validation is still needed, the current study uncovered six drug targets and eleven putative antigenic In further studies, structural analysis of the drug targets could be helpful to design specific small-molecule inhibitors to control the pathogen. Moreover, putative antigenic proteins could be used to detect H. hepaticus. The identified epitope regions can be utilized to speculate potential multi-epitope vaccine constructs with different adjuvants in further studies.