Vaccine Design, Adaptation, and Cloning Design for Multiple Epitope-Based Vaccine Derived From SARS-CoV-2 Surface Glycoprotein (S), Membrane Protein (M) and Envelope Protein (E): In Silico Approach
The SARS Coronavirus-2 (SARS-CoV-2) pandemic has become a global epidemic that has increased the scientific community's concern about developing and finding a counteraction against this lethal virus. So far, hundreds of thousands of people have been infected by the pandemic due to contamination and spread. This research was therefore carried out to develop potential epitope-based vaccines against the SARS-CoV-2 virus using reverse vaccinology and immunoinformatics approaches.
Material and Methods:
The material of SARS-COV2 Surface Glycoprotein (S), Membrane Protein (M), and Envelope Protein (E) were downloaded from the NCBI protein database. Each protein has undergone epitopes prediction for MHC class I epitopes, MHC class II epitopes, and Antibody of B-cell epitopes. Selected epitopes according to their antigenicity score was tested for allergenicity and toxicity. Finally, filtered epitopes were used in vaccine construction. Vaccines were constructed, docked against Toll-like receptor 3, and undergone Molecular Dynamic simulation. The vaccine with the best scores, subjected to immune stimulation and cloning design.
Three vaccines were constructed, COVac-1, COVac-2, and COVac-3. Each vaccine was submitted into a deep investigation. The molecular dynamic simulation determines the stability and physical movement of protein atoms and molecules. After Molecular dynamics simulation, COVac-1 was having the best scores. COVac-1 was then subjected to immune simulation analysis to insure the stimulation of innate and adaptive immunity. After passing the immune simulation, COVac-1 was integrated into E.coli pET-30b plasmid using in silico cloning design.
Viral pandemics are threatened to face humanity today. The best scenario to fight against any pandemic is utilizing the full power of computational biology, especially immune-informatics, to design and discover in silico new vaccines or molecules that may stimulate the immune system against the invader pathogens or inhibit the pathogen life cycle.
Coronaviruses are a family of viruses belonging to the Coronaviridae family and the Nidovirales order. These viruses are single-stranded, positive-sense RNA viruses with a genome size ranging from 26 to 32 kilo bases. Coronaviruses are known to cause acute upper respiratory tract infections and significant respiratory infections in children and adults and infect humans as well as certain other species such as murine, porcine, feline, bovine, and avian [1-3]. So far, seven distinct human coronaviruses (HCoVs) have been identified. Four HCoVs strains, i.e., HCoV-OC43, HCV-229E, HCV-NL63, and HCV-HKU1, could induce common cold in immune-depressed people and two other HCoVs strains, i.e., SARS-CoV, and MERS-CoV, might cause severe acute respiratory syndrome [4-7]. The seventh strain believed to infect humans is the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which is responsible for the latest pandemic worldwide, causing the deadly coronavirus disease-2019 (COVID-19). COVID-19 was first detected in a group of pneumonia patients in Wuhan, China, in December 2019 . The first COVID-19 fatality case was identified in Wuhan, China, on 11 January 2020, and the first affected case outside China was reported in Thailand on 13 January 2020 . The most common symptoms of COVID-19 onset include fever, cough, tiredness, diarrhea, and patients experience trouble breathing in extreme circumstances . On 11 March 2020, the World Health Organization (WHO) proclaimed COVID-19 a pandemic, as affected cases outside China rose 13 times by the end of February 2020, and more than 4000 deaths were registered globally. At this time, 108,879,471 cases were recorded on 13 March 2020, 2,397,946 cases of mortality, 80,912,198 cases of recovery in 177 countries were registered globally. Interferon are a little powerful in combination with ribavirin. There are many efforts to use natural compounds and molecules to inhibit SARS-CoV-2 binding to ACE2, the receptor responsible for viral penetration  and classifier of proteins or coronavirus from CT scan using machine learning [12, 13]. However, we must further assess the efficacy of the combination solution . This analysis was conducted in order to design a new epitope-based vaccines against three SARS-CoV-2 proteins. Those proteins are namely, surface glycoprotein (S) responsible for viral fusion events in the human cell during viral penetration [15, 16]; the Envelope protein (E) which is the cover protein for the virus components. It is also involved in many aspects of the virus' life cycle  and membrane glycoprotein (M) that mediates the interaction between virions and cellular receptors . Reverse vaccinology and immuno-informatics are used to examine the genome and genetic material of the specific virus in which novel antigens of a virus or microorganism or a disease organism are identified. The methods of bioinformatics are used in reverse vaccinology to classify and analyze certain novel antigens. These techniques are used to dissect the genome and genetic structure of a pathogen in order to improve the future vaccine. The reverse vaccinology approach also makes it possible for scientists, during the vaccine process, to understand the antigenic segments of a virus or pathogen. These methods for developing vaccines are fast, inexpensive, reliable, simple, and cost-effective. The methods were successfully employed in developing vaccines to combat many viruses, such as Zika, Chikungunya, etc. [19, 20].
MATERIAL AND METHODS
The current experiment has been performed to improve possible SARS-CoV-2 vaccines by using reverse vaccination strategies, which is the using the pathogen’s proteins to design a vaccine from it. In this experiment, the materials imported from and the methods used were adapted from  work, which constructs vaccine by prediction of the epitope for B-cells, T- cells, and MHC class I and II. Those predicted epitopes have undergone investigation to filter the epitope that is non-allergenic, non-toxic, and potent antigenic assemble the filtered epitopes to build the vaccine construct. The potential vaccine constructs were analyzed using molecular docking and dynamic simulation to determine the strength and intactness of the vaccine. Finally, the vaccine passed the dynamic simulation and molecular docking analysis was chosen to design a cloning plasmid based on the E. coli plasmid system.
Viral protein sequence identification, selection, and retrieval
Proteins downloaded from NCBI database. Proteins retrieved in FASTA format was selected for the potential vaccine design. Those proteins are Membrane Glycoprotein (accession no: YP 09724393.1), Envelope Protein (accession no: YP 009724392.1), and surface Glycoprotein (accession no: YP 009724390.1). Table 1 lists the NCBI accession number of the protein sequences.
Selected proteins for candidate vaccine design
|YP_009724393.1||Membrane Glycoprotein||Virus Membrane|
|YP_009724392.1||Envelope Protein||Virus Membrane|
|YP_009724390.1||Surface Glycoprotein||Virus Membrane|
Antigenicity prediction and physicochemical property analysis of the protein sequences
Vaxijen webserver reports the three proteins as potent antigens that may stimulate the immune system (Table 2). For these three selected proteins, the physicochemical property analysis was conducted to determine some of the protein characteristics. Surface glycoprotein has the highest molecular weight, but it has the lowest pI of them. Membrane glycoprotein has the highest pI, but the GRAVY score is still positive.
Surface Glycoprotein's theoretical pI of 6.24 was the highest forecast. A similar half-life of 30 h was found to be predicted for the three proteins. However, the highest predicted instability index and positive hydropathic average were for Membrane glycoprotein (GRAVY).
T-cell and B-cell epitope prediction and their antigenicity, allergenicity, and topology determination
MHC class I and II molecules are two primary classes of major histocompatibility complex (MHC) molecules. They are found on the surface of the cell of all nucleated cells in the bodies of humans. The class I MHC function is to presenting intracellular proteins to cytotoxic T cells (CTLs). For possible vaccine construction using the server IEDB (https://www.iedb.org/), which generated numerous epitopes, class I and class II MHC epitopes were predicted (Table 3, 4, 5). The server contains data on human, non-human primates, and other animal species related to allergy, infectious diseases, self-immunity, and transplantation, which has been confirmed experimentally and approved for antibody and T-cell epitopes. By analyzing these experimental data and examining the input protein, the server predicts epitopes . Based on the antigenicity values however, eight MHC class I, 6 MHC class II, and four antibody epitopes have been selected (Table 6) after filtering the epitopes generated by AS and by percentiles from the top 12 MHC class I, sixteen MHC class II, and nineteen antibody epitopes (Table 3, 4, 5). The percentile values are the forecast binding affinity, and fewer percentile values represent a higher binding affinity . Subsequently, vaccine construction was selected from epitopes with high antigenic, non-allergic, and non-toxicity.
Three vaccines were selected using the selected epitopes intended to fight SARS-CoV-2. Three different adjuvants have been used for vaccines: beta-defensin, ribosomal protein, L7/L12 protein, and HABA protein and different linkers such as EAAK, GGGS, GPGPG, and KK linkers. A vaccine construction sequence PADRE is an important sequence. It can increase the vaccine's strength with minimum toxicity. Moreover, the PADRE sequence also improves the CTL response, thus ensuring a potent immune response . The newly built vaccines were: COVac-1, COVac-2, and COVac-3 and further analyzed their physical properties (Table 7).
The physicochemical properties of viral proteins
The predicted epitopes for the Envelope protein
The predicted epitopes for the Membrane protein
The predicted epitopes for the Surface Glycoprotein
The selected and filtered out epitope for the three protein
The vaccine constructs and their physicochemical properties
Antigenicity and allergenicity of the vaccine constructs
Table 8 lists the results of the analysis of antigenicity, allergy, and physicochemical property. All three vaccine buildings are both antigenic and non-allergenic.
Analysis of antigenicity, allergy, and physicochemical
|Vaccine name||Antigenicity score||Antigenicity||Allergenicity|
Secondary and tertiary structure prediction of the vaccine constructs
The secondary structure refers to recurring arrangements within a chain of polypeptides in adjacent amino acid residue space. The amide hydrogen and carbonyl oxygen bonds in the peptide backbone maintain it. α-helix and β-structures are the main secondary structures. In the secondary structure analysis, COVac-2 was shown to be the highest percentage of 138 amino acid coils and the highest percentage of 33 amino acids forming an alpha helix (Fig 1, 2). However, CV-3 had the highest percentage of 91 amino acids in extended strand formation (Fig 3). COVac-2 estimated at 9.9785 was the lowest root mean square deviation in Ångström (RMSD Å). In Å is estimated the average deviation from the experimental structure. The less the quality of the 3D model is, the better. COVac-3 thus has the largest RMSD: 12,475 showed the worst 3D structure results from the experiment. Three different templates were also used for the 3D structures of the three different vaccines (Fig 4, 5, 6). The RaptorX server used the templates  for the 3D structures of query vaccine construction. The outcome of the 3D structural analysis is presented in Table 9.
Calculated of RMSD for the three construct
|Name of the vaccine||RMSD(Å)|
3D structure refinement and validation
The three vaccine constructs were refined and validated in the 3D structure refinement and validation step to ensure that the protein structure is in the right orientation and order. The server PROCHECK (https://servicesn.mbi.ucla.edu /PROCHECK/) divides Ramachandran into four regions: the most favored (red), additional allowable (yellow) region, generous allowable region (light yellow), and disallowed area (represented by white color). According to the server, over 90% of their amino acids in this most favored region and additionally permitted regions should have a valid protein (the best quality protein). A few percent of the amino acids of the protein may also occur in the additional region and generously allowed region. However, there should be no amino acid in the unauthorized region [25-27] for further analysis and validation, the 3D protein structures created in the previous step have been refined. With the help of the Ramachandran Plots, the refined structures were validated. The analysis showed that COVac-2 vaccines had an outstanding 91.8% of the amino acids in the allowed region, 7.3% of the amino acids in the additional areas, and 0.9% of the amino acids in the disregarded regions. In the most allowed areas, the COVac-3 vaccine contained 90.6% of amino acids, 8.9% of amino acids in the additional allowed regions, 0.5% of amino acids in the generously allowed regions, and 0.0% of the amino acids in the regions that are disallowed. With 89,2% of amino acids in allowed regions, 9,6% of amino acids in additional allowed regions, 0,8% of amino acids in the generously allowed regions, and 0, 4% of amino acids in disallowed regions, COV-1 vaccines showed the worst result. (Fig 7).
Vaccine protein disulfide engineering
A disulfide protein bond is a covalent connection in two cysteine residues between sulfur atoms (–SH) in the Thiol group. The disulfide is formed by oxidation of the two thiols (also called the SS-binding, Disulfide Bridge, or crosslink). It thus connects the two cysteines and their respective major peptide chains with the covalent disulfide binding. Conversely, a reduction reaction can disrupt a disulfide bond (e.g., using dithiothreitol). In protein disulfide engineering, the amino acid pairs with a bond value below 2.00 kcal/mol were selected in the experiment. In this study, the amino acid pairs that had bond energy values less than 2.2 kcal/ mol, were selected . The COVac-1 generated 43 amino acid pairs that could form disulfide bonds. However, only five pairs were selected because they had the bond energy, less than 2.00 kcal/mol: 47-CYS 61-CYS, 268-CYS 271-CYS, 287-PRO 331-CYS, 325-CYS 331-CYS, and 21-CYS 25-CYS. Although COVac-2 and COVac-3 generated 44 and 37 pairs of amino acids, respectively, that might form disulfide bonds, where 5 amino acids selected in both COVac-2 and COVac-3 showed bond energy less than 2.00 Kcal/mol. The selected amino acid pairs of COVac-1, COVac-2, and COVac-3 formed the mutant version of the original vaccines.
Protein-protein docking study
The prediction of the complex structure given the structures of each protein is the protein-protein docking. The notion of a physical and steric complementarity at the protein-protein interface is at the heart of the docking methodology. The docking study for protein-protein has been conducted to find the best-constructed vaccine for COVID-19. The vaccine constructs with the best result in the molecular docking were considered as the best vaccine construct. Based on the docking results, the best constructed vaccine was found to be COVac-2 according to ClusPro 2.0, COVac-2 showed the best weighted score (Center: -863.0 and Lowest Energy: -1069.3) and the largest number of members (58). COVac-2 comes in the second rank with 57 members and the Lowest energy (-1090.6). COVac-3 was the worst one with 56 members and (-1241.1). When analyzed with PatchDock and FireDock servers, CV-2 showed the best and lowest global energy (-0.29), attractive VdW (-22.45), and repulsive VdW (15.40). Again, COVac-1 has the worst global energy (-7.11) and COVac-3 -1.17. Since COVac-2 showed the best results in the protein-protein docking study with almost all the targets by all the servers and with the TLR-8, it was considered the best vaccine construct among the three constructed vaccines (Fig 8). Later, in silico immune simulation, the molecular dynamics simulation and in silico codon adaptation studies were conducted only on the COVac-2 vaccine.
In silico immune simulation
C-ImmSim studies mechanisms of the successive and effective immune responses of the cell condition and the memory of immune cells. The effect is that few cells increase their half-life substantially and live longer than other cells. ImmSim server immune simulation results confirmed consistency with true immune responses. The response was illustrated by high IgM levels. Also, an increase in the B-cell population was characterized as an increase in immunoglobulin expression (IgG1+IgG2, IgM, and IgG+IgM), resulting in a decrease in antigen concentration (Fig 8A, C). There is also a clear increase in the population of The (helper) and T C (cytotoxic) cells with memory growth (Fig 8e, F). IFN-γ production was also identified to have been stimulated after immunization (Fig 8D). The T-cell population results have been approachable significantly with the growth of the memory and consistent exposure for all other immune cell populations.
A molecular dynamics simulation study
The results of molecular dynamics simulation of COVac-2-TLR-8 docked complex is illustrated in Fig 9. Protein dynamic simulation determines the stability and physical movement of protein atoms and molecules . The simulation was thus conducted to determine the vaccine protein's relative stability. The deformability graph shows the peaks representing the protein regions with a moderate deformation rate (Fig 9b). The complex's B-factor chart makes it easy to visualize and compare NMA and the PDB field of the docked complex (Fig 9c). The value of the docked complex is shown in the Fig 9 docked complexes COVac-2 and TLR8 generated good 3.315510e-06 values. The graph shows the variance with red bars and the cumulative variance with green colored bars. (Fig 9e). The co-variance map of the complex, where red color represents the correlated motion between a pair of residues, uncorrelated motion is indicated by white color as well as the anti-correlated motion is marked by blue color. The elastic map of the complex refers to the connection between the atoms and darker gray regions indicate stiffer regions (Fig 9g) .
Codon adaptation and in silico cloning study
The number of nucleotides in a probable COVac-2 sequence would be 1255 as the COVac-2 protein had 416 amino acids before reverse translation. The codon adaptation index (CAI) value of 0.94 of COVac-2 indicated that the DNA sequences have a higher percentage of the codons that should be used by the cellular machinery of the target organism E. coli strain K12 (codon bias). For this reason, the production of the COVac-2 vaccine should be carried out successfully . The GC content of the improved sequence was 52.80% (Fig 10). The predicted DNA sequence of COVac-2 has inserted into the pET-30b(+) vector plasmid between the EcoRI and BamHI restriction sites and since the vaccine DNA sequence did not have restriction sites for EcoRI and BamHI restriction enzymes, EcoRI and BamHI restriction sites were conjugated at the N-terminal and C-terminal sites, respectively. The newly constructed vector is illustrated in Fig 11.
The current study has been conceived to develop potential SARS-CoV-2 vaccines, which are the cause of the recent COVID-19 pandemic worldwide. Tens of thousands of people worldwide have already been killed by pneumonia. Therefore, potential vaccines to fight this lethal virus were predicted in this study. Three candidate virus proteins were identified and selected from the NCBI database to perform the vaccine construction. For further analysis, only highly antigenic sequences have been chosen because highly antigenic proteins can produce better immunogenic responses .
We have predicted Linear B and T-cell epitopes using immunoinformatics tools that may promote cell and humoral immunity. These epitopes of B-cells and T-cells may theoretically be used to produce vaccines targeting the viral protein and maybe reliable for stimulating both humoral and cell-mediated immunity. In the present research, T-cell and B-cell epitopes were predicted via the IEDB server. For adaptive immune stimulation, T-cell epitopes are necessary and are sufficient to cooperate with MHC molecules. To build the epitope-based vaccine, we predicted B and T cell epitopes for nominated antigens and joined them with EAAK, GGGS, GPGPG, and KK linkers after antigenicity and allergenicity check. The EAAAK linker was also fused between the adjuvant and the epitopes sequences for the best expression and bioactivity improvement of the vaccine. The constructed multi-epitope vaccine showed higher scores of antigenicity both on the Vaxijen v2.0 server. Multi-epitopic vaccines have less immunogenicity and need adjuvants. Molecular docking and MD simulation were implemented, and the RMSD plot representing the steady binding of the complex. Immune simulation results showed corresponding to typical immune responses. The generated immune responses increased generally after repeated exposure to the antigen. The development of memory B-cells and T-cells was apparent, with several months of memory in B-cells. Particularly stimulated were helper T cells. Another interesting finding was the increase in IFN-μ and IL-2 levels following the initial injection and the peak following the repeated exposure to antigen. This shows that T H cells are high and have therefore an efficient and humorous response to the Ig production. This must be expressed in a suitable host by recombinant protein. The preferred option for recombinant protein expression is the E.coli Systems for expression. Codon optimization has been performed to ensure that the recombinant E.coli vaccine has high levels of expression. System of E.coli (K12 strain). Both the GC content and the CAI score were beneficial for high-level protein expression in bacteria. The next step is the expression and numerous immunology analyses necessary to confirm the results obtained by immunoinformatic analyses of this peptide within a bacterial system.
One of the most deadly pandemics has recently occurred due to the SARS-CoV-2. Prevention of the new infection is both very difficult and obligatory. The potential of in silico methods can be used to find demanded solutions with fewer tests and mistakes to save the scientists both time and cost. Potential subunit vaccines against SARS-CoV-2 have been designed using various reverse vaccinology and immunoinformatics techniques in this study. The highly antigenic viral proteins and epitopes were employed to design the vaccines. Various types of computational studies with the vaccine constructs suggested show the possibility of a good immunogenic response. Consequently, these suggested vaccine constructs could be used effectively for vaccinations for preventing and spreading SARS-CoV-2 if satisfactory results are achieved in numerous in vivo and in vitro tests. Our present study should therefore help scientists to develop possible SARS-CoV2 vaccines and therapeutics.
CONFLICTS OF INTEREST
The author declare no conflicts of interest regarding the publication of this study.
No financial interests related to the material of this manuscript have been declared.