Using Bioinformatics to Study Homology Between Herpes Simplex Virus Type 1 Glycoprotein C Gene and Herpes Simplex Virus Type 2 Glycoprotein F Gene

Bioinformatics is conceptualizing Biology in terms of molecules and applying “informatics techniques” (derived from disciplines such as applied math’s, computer science and statistics) to understand and organize the information associated with these molecules, on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications. Bioinformatics is an application of techniques from computer science to problems from Biology. It is link between Computer Science and Biology. And has three aims: the first one: organizes data in a way that allows researchers to access existing information and to submit new entries as they are produced, e.g. the Protein Data bank for 3D macromolecular structures. The second aim: is to develop tools and resources that aid in the analysis of data. For example, having sequenced a protein, it is of interest to compare it with previously characterized sequences. The third aim: is to use these tools to analyze the data and interpret the results in a biologically meaningful manner.


Introduction
Bioinformatics is conceptualizing Biology in terms of molecules and applying "informatics techniques" (derived from disciplines such as applied math's, computer science and statistics) to understand and organize the information associated with these molecules, on a large scale.In short, bioinformatics is a management information system for molecular biology and has many practical applications.Bioinformatics is an application of techniques from computer science to problems from Biology.It is link between Computer Science and Biology.And has three aims: the first one: organizes data in a way that allows researchers to access existing information and to submit new entries as they are produced, e.g. the Protein Data bank for 3D macromolecular structures.The second aim: is to develop tools and resources that aid in the analysis of data.For example, having sequenced a protein, it is of interest to compare it with previously characterized sequences.The third aim: is to use these tools to analyze the data and interpret the results in a biologically meaningful manner.
Bioinformatics is a very broad field and it encompasses issues like mapping, sequencing, sequence comparison, gene identification, protein modeling, network databases, visualization and ethics.It is an interdisciplinary subject that on one hand requires biological information-infrastructure building and on the other requires computation based biological research.All this depends on the large stores of experimental and derived data.
The foundation of Bioinformatics is based on the computational techniques, algorithms, artificial intelligence, database management, software engineering etc.All this leads to the development of community data resources and from this starts its applications development of the bioinformatics for analysis of the genetic data.
More recently, in-frame deletion and linker insertion mutants of gC-2 were used to identify regions important for C3b binding [26,27].These studies showed that amino-terminal residues 26 through 73 of gC-2 are not involved in C3b receptor activity.In addition, three distinct regions (I, II, and III) in gC-2 are important for C3b binding.Region III has features like those of the short consensus repeat (SCR) [13,18,23,24,28] of the human C3-C4 receptor CR1 [26].
Several studies employed a transient transfection system using the gC-2 gene cloned into the genomes of herpes simplex virus type 1 (HSV-1) and HSV-2 encode at least four different glycoproteins, gA/B, gC, gD, and gE, which are found on the surface of the infected cell and the virion [39].Three of the four glycoproteins, gA/B, gD, and gE, have been found to be structurally similar in the two virus types, based on immunological and biochemical criteria [33][34][35][36][37].For example, a recent analysis of the DNA sequences of the gD genes from HSV-1 and HSV-2 revealed that the gD proteins had an overall sequence homology of 85% (L.Lasky and D. Dowbenko, DNA, in press).Thus, it may be concluded that the primary sequences of these three glycoproteins have been relatively well conserved, since the two virus types diverged from each other.HSV-1 gC initially appeared to have no obvious homolog in HSV-2.HSV-1 gC was thought to be type specific since antibodies against this glycoprotein were found to react almost exclusively with HSV-1 gC [38].In addition, no detectable immunological reactions could be demonstrated between HSV-1 gC and antisera made against HSV-2 [32].A protein having the same electrophoretic mobility as HSV-1 gC has been demonstrated in HSV-2; however, it did not map collinearly with HSV-1 gC [17].
In contrast to HSV-1, HSV-2 appears to encode yet another glycoprotein, termed gF [30,31,35,40].Although HSV-2 gF had an electrophoretic mobility which was much faster than HSV-1 gC, initial mapping studies with recombinant viruses revealed that this protein was encoded by a region of the HSV-2 genome which was approximately colinear with the gene for HSV-1 gC (35,40).Subsequent studies with finer structural mapping revealed a much closer collinearity between the HSV-1 gC and the HSV-2 gF coding regions [41].In addition, it has been recently demonstrated that a monoclonal antibody against HSV-2 gF cross-reacts weakly with HSV-1 gC [42] and that a polyclonal antiserum made against HSV-1 virion envelope proteins precipitates gF [40], suggesting a possible structural homology between the two glycoproteins.Thus, it appeared that a possible homolog to HSV-1 gC was the HSV-2 gF protein.The most conclusive proof of relatedness between two proteins is to demonstrate homology at the amino acid level.
In this research we used Bioinformatics to study homology between the Herpes Simplex Virus type 1 glycoprotein C gene and Herpes Simplex Virus type 2 glycoprotein F gene by using dry lab biometric analysis of biological data using software such as (BLAST, FASTA, PDB and PHYRE2).

GenBank
GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2013 Jan;41(D1): D36-42).GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA Databank of Japan (DDBJ), the European Nucleotide Archive (ENA), and GenBank at NCBI.These three organizations exchange data daily.
A GenBank release occurs every two months and is available from the ftp site.The release notes for the current version of GenBank provide detailed information about the release and notifications of upcoming changes to GenBank.Release notes for previous GenBank releases are also available.GenBank growth statistics for both the traditional GenBank divisions and the WGS division are available from each release.We used the gene bank to detect the DNA sequences of both glycoprotein C and glycoprotein F of herpes simplex virus type 1 and type 2 respectively (https://www.ncbi.nlm.nih.gov/genbank/).

Blast
BLAST finds regions of similarity between biological sequences.The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences.The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.We used the blast programmed to detect the homology between herpes simplex virus type 1 glycoprotein C and herpes simplex virus type 2 glycoprotein F and estimate the percentage similarity as well as the homology between some other strains (https://blast.ncbi.nlm.nih.gov/Blast.cgi)

Protein Data Bank PDB
The protein data base is used to detect the amino acid sequences of both glycoprotein C and F and detect the homology between the 2 sequences (https://www.ncbi.nlm.nih.gov/pdb).

Phyre2 to analyze protein structure
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations.The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools.Phyre2 replaces Phyre, in this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs) for a user's protein sequence.This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality.A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model.A typical structure prediction will be returned between 30 min and 2 h after submission (http://www.sbg.bio.ic.ac.uk/ phyre2).

Homology between Herpes simplex virus type 1 glycoprotein C gene and Herpes simplex virus type 2 glycoprotein F sequence (Table 2)
The above result indicates 81% homology between Herpes simplex virus type 1 glycoprotein C gene and Herpes simplex virus type 2 glycoprotein F sequence.

Detection of Amino acid sequences of Glycoprotein C and F
The amino acid sequences of both Glycoprotein C and F were detected by Protein data bank PDB as indicated below

Alignment between glycoprotein C and other strains and isolates
The alignment between glycoprotein C and F of Herpes simplex virus type 1 and 2 respectively and some other steins and isolates indicates around (99-100%) homology.Structure of glycoprotein C: The below structure of glycoprotein C presented by PDB indicates tertiary (three dimentional) structure that harbors both α helices and β pleated sheets (Figure 1)

Discussion
The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science.
Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied math's, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale.
We are using Bioinformatics to study homology between the Herpes Simplex virus type 1 glycoprotein C gene and Herpes Simplex virus type 2 glycoprotein F and We found 81% homology, while [1] found the overall sequence homology between these two fragments was 68%.However, certain regions of the sequence showed either a much higher or lower degree of sequence homology than others.For example, the sequences between positions 0 and 570 of the HSV-1 and HSV-2 sequences showed only 51% homology, whereas the region between positions 570 and 1740 showed a much higher degree of sequence homology (80%).An additional highly homologous region (70%) was also found at the end of the two sequences from position 1975 to position 2419.In addition to the nucleotide sequence changes, the two genomes showed various deletions or insertions when compared with each other.The most notable was an 81-base-pairregion found at positions 346 to 426 of the HSV-1 gC sequence which is missing from the HSV-2 genome.From this overall sequence comparison, it appeared that there was a high degree of sequence homology between the HSV-1 region and the HSV-2 gF region sequenced here.
Our results help explain previous results which demonstrated that the HSV-2 gF and HSV-1 gC proteins were mainly type specific but did have type-common determinants [36,38,40,42].Since several previous studies [32,36] demonstrated that these proteins induced predominantly type-specific antibodies, it is reasonable that the most antigenic regions of the proteins are found within the more divergent N-terminal sequences which follow the putative hydrophobic signal sequences.The hydrophilic nature of the divergent regions, along with their high content of potentiallinked glycosylation sites [45], suggests that these regions would be located on the surface of the protein.Exposure of these divergent sequences to the outside of the proteins maybe responsible for the generation of type-specific antibodies directed against these divergent epitopes.However, type common antibodies could likely also be generated by the more highly conserved carboxy-terminal three-fourths of the proteins, since hydrophilic regions conserved between gC and gF could be exposed to the outside of the proteins and may be, in one case, glycosylated (residues 363 to 366 of HSV-1 gC and 330 to 332 of HSV-2 gF).Thus, HSV-1 gC andHSV-2 gF share both type-specific and type-common determinants, but it appears that the type-specific determinants are more antigenic.Although an explanation of the type-specific and type common determinants of gC and gF must at this point be speculative, it is possible that the proteins have at least two functions, one of which is important for the viability of both viruses, the type-common domain, and one of which is specific for each virus type, the type-specific domain.
Although the function(s) of gC and gF is at present unknown and viable gC minus mutants of HSV-1 have been isolated invitro [44], it is not clear that either gC or gF is indispensable to the viruses during in vivo infection of the human host and the establishment of latency.It is possible that at least some of the biological differences between HSV-1 and HSV-2, including prediction for site of infection and virulence, maybe due to the marked structural differences between the amino-terminal regions of gC and gF.

Conclusion
The results reported in this paper demonstrate that the HSV-1 gC and HSV-2 gF glycoproteins are highly homologous and that they encode type-common and type-specific domains.

Figure 1 :
Figure 1: Structure of glycoprotein C.Structure of glycoprotein F:The above structure of glycoprotein F presented by PDB indicates tertiary (three dimensional) structure that harbors both α helices and β pleated sheets (Figure2).

Figure 2 :
Figure 2: Structure of glycoprotein F.Phyre2 to analyze protein structureAnalysis of the structure of glycoprotein C and its functions (Figure3): Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations.It showed the followings:

Figure 3 :
Figure 3: Analysis of the structure of glycoprotein C and its functions.
heterodimer structure of glycoprotein F b) The antigenic determinant site that binds with the fc fragment region of specific immunoglobulin c) Solution structure of human secretory IgA d) Enzyme deglycosylase human IgG Fc fragment e) The complex crystal structure of glycoprotein F

Figure 4 :
Figure 4: Analysis of the structure of glycoprotein F and its functions.

Table 1 :
Detection of DNA sequence of glycoprotein C and F Gene

Table 2 :
Homology between Herpes simplex virus type 1 glycoprotein C gene and Herpes simplex virus type 2 glycoprotein F sequence

Table 3 :
Alignment between glycoprotein C and other strains and isolates.

Table 4 :
Alignment between Glycoprotein F and other strains and isolates.