Evidence-Based Mechanistic Integrative Modeling of Stroke Risk Factors: A Translational Research Study, , , and
According to global statistics, stroke is known as the main health problem in the world. Many clinical and molecular research, which are stored in the different repository with the various format have been conducted in the area of stroke domain. The heterogeneity of these research data does not make a comprehensive view of the disease. Recently, translational research has been developed to fill the gap between these studies. In this study, we used the integrative disease modeling method to model the underlying mechanism of stroke risk factors.
Material and Methods:
This study was conducted in three steps: data gathering, model construction, and mechanism discovery. First, using semantic and information retrieval tools, we extracted the cause and effect statement from the literature to create the mechanistic model, and the validated molecular data to evaluate the constructed model. Then, the integrative model was created and evaluated. Finally, we used Gene Set Enrichment Analysis to identify the main biological process and signaling pathways in the mechanism of the disease.
In the evidence-based information retrieval from the literature, 1837 causal statement was extracted. The initial network was created with 648 nodes (molecular, clinical, and environmental factors) and 1837 edges (interactions). Also, 51 genes/proteins and nine single nucleotide polymorphisms were matched with data in the model. The inflammatory response, response to lipid, regulation of body fluid levels, and regulation of response to stress, complement and coagulation cascades, and PPAR signaling pathway were the main biological processes and signaling pathways enriched in GSEA analysis.
Stroke is the second cause of death worldwide  has more than 150 known causes and about 25%-30% of stroke cases are classified as heterogeneous . Genetic factors, also are involved in both strokes with known and heterogeneous causes . Therefore, a variety of environmental and genetic factors are involved in brain stroke.
The complexity of stroke is not just about the gene-environment interaction. The lack of access to brain tissue in live patients, the complexity of the anatomical structure of the brain, and the lack of biomarkers to predict stroke, all contribute to the complexity of the disease. In recent years, the macro pharmaceutical industry has invested more than $ 1 billion in discovering and producing new drugs for stroke, but they have not yet marketed a drug other than tissue plasminogen activator (TPA), and most of these investments have failed [4, 5]. TPA has a specific list of indications and contraindication. The efficacy of this drug depends on the time consumed. Therefore, strategies based on primary or secondary prevention are prioritized . These failures have led the pharmaceutical industries to reduce their activities in the production of neuroprotective drugs . Neuroprotective drugs are drugs that prevent or reduce the progress of the disease. Interestingly, neuroprotective drugs in pre-clinical and in-vitro investigations have a very good and repeatable therapeutic effect but do not show a satisfactory performance when tested in humans. Therefore, the transfer of the therapeutic effects of new drugs from model animals to the human body requires a deep understanding of the molecular mechanisms involved in the stroke in the human brain. With deep knowledge and insight from the interaction of molecular and environmental factors, it is possible to identify the biological pathways that lead to stroke and target new drugs or preventive strategies.
Since stroke is a complex disease of the nervous system, identifying the causes of its occurrence, treatment, and prevention requires a comprehensive investigation of scientific findings at the pre-clinical, clinical, and post-clinical level. This disease not only has different phenotypes but also has very complex causative mechanisms and various risk factors . One of the modern and expanding strategies, which recently proposed to overcome the complexity of the disease mechanism is translational research, which is done by integrating various data and modeling the mechanism of disease at all physio-pathological levels . Various research groups and researchers have conducted different studies on the area of stroke domain. Each study is published and stored in different databases, with different and often heterogeneous formats. This heterogeneity does not result in a comprehensive view of the mechanism of the disease. In this study, we pointed out that the utilization of translational research techniques explains the underlying mechanism of stroke risk factors at the molecular level and translates into clinical practice.
MATERIAL AND METHODS
This study was done in three steps, including data gathering, model construction, and mechanism discovery.
Step 1: Data gathering
Selection of the main stroke risk factors
In this study, we used the integrative modeling of risk factors mechanism underlying stroke. Since, stroke has many risk factors, to avoid the complexity of the model, we have to select one or two main risk factors. To do this, we conducted a hospital-based study. In the study population, hypertension, dyslipidemia, and diabetes significantly increased the risk of stroke. For model construction, we selected dyslipidemia and diabetes. The result of this study was published in  and .
Information retrieval from biomedical literature
Since the mechanism of the disease is usually a cause-and-effect relationship, it was necessary to access this information to model the mechanisms of stroke. We used text mining and semantic tools to retrieve information and extract knowledge from the literature. First, we integrated Stroke Ontology (STO) (https://bioportal.bioontology.org/ontologies/STO-DRAFT) with KNIME, a text-mining tool . Then using STO root class terms and based on two main stroke risk factors (dyslipidemia and diabetes) we conducted a search strategy on PubMed abstract; accessed 25.02.2017. We manually filtered 545 retrieved abstracts based on the relevancy of their content, of which 157 abstracts were selected. We extract the casual statement underlying stroke from the full text of these selected papers. The process of information retrieval and knowledge extraction illustrated in Fig 1.
Molecular data gathering for model evaluation
Furthermore, we need experimental omics data (gene and protein expression and Single Nucleotide Polymorphisms (SNPs) data) to biologically evaluate the constructed model. Gene expression data gathered from the Gene Expression Omnibus (GEO) database. The GEO database accession number was GSE43618 (PMID: 23559260), GDS4521 (PMID: 22453632), GSE55937 (PMID: 24911610), and GSE37587 (PMID: 25124890). Also, we found some experimental data from (PMID: 15630028), (PMID: 16395289), (PMID: 17997827), and (PMID: 27407070). The human protein atlas was used to gather protein expression data. We used the GWAS catalog and Array Express database to gather SNPs related to stroke. The full list of gene expression, protein expression, and SNPs data could be found in supplementary file 1, 2, and 3, respectively.
Step 2: Model construction and validation
In this study, we create a computational cause and effect network of all molecular and non-molecular data of stroke pathophysiology using the Cytoscape tool . This model illustrated the interaction between various factors extracted from the literature. This network contains nodes (clinical, molecular, and environmental data) and relations between them which shows causal relations of various factors.
The created model was based on published literature, which may be shown in the elusive biological process. To realize the constructed model, we evaluated the model using curated molecular data (explained in Step 1). Using the Cytoscape tool, we matched our model with these data.
Step 3: mechanism discovery
In this step, we used Gene Set Enrichment Analysis (GSEA) to identify the main signaling pathway and biological process. For enrichment analysis, we used Gene Ontology (GO) biological process gene set and Kyoto Encyclopedia for Gene and Genome (KEGG) gene set from the Molecular Signature Database (MSigDB). After adjustment enriched processes and signaling pathways with our constructed model, the underlying mechanism of dyslipidemia and diabetes in stroke patients was explained.
About 1837 statement filtered after STO-supported manually retrieval of a casual statement from the literature. Of each statement, we extract the cause, effect, and their relation to save in a separate file for using in the next step.
Model construction and validation
The initial network of different factors underlying stroke comprises 648 nodes (molecular, clinical, and environmental factors) and 1837 edge (interactions). Using this casual model, we aimed to identify the underlying mechanism which increases the risk of stroke. Fig 2 illustrates the causal model of selected risk factors of stroke.
To evaluate the constructed model, curated molecular data (see Method) were matched with the model. As shown in Table 1, 51 expressed Gene and Protein were matched within the model.
The count and name of the expressed gene and protein were matched within the model.
Also, nine SNPs were found within the model. Table 2 shows the list of SNPs with their related gene/protein name.
The main model of stroke was filtered by these genes/proteins and their first neighbors. The new network contains validated multiple causal factors which have 209 nodes and 819 edges.
The Single nucleotide polymorphisms(SNPs) and related gene/protein were matched within the model.
Gene Set Enrichment Analysis
Using GSEA analysis of GO (biological process) on our gene/protein list (retrieved from the literature), we identify a list of the enriched biological process involved in the mechanism of dyslipidemia and diabetes in stroke (Table 3). In accordance with the stroke model, biological process of the inflammatory response, response to lipid, regulation of body fluid level, and regulation of stress response enriched.
The top biological process from Gene Set Enrichment Analysis (GSEA) of Gene Ontology (GO): biological process data set
GSEA pathway analysis resulted in a list of the enriched significant pathway (Table 4). Complement and coagulation cascades and PPAR signaling pathway corresponded with the stroke model.
Fig 3 illustrates the mechanisms by which oxidative stress led to inflammatory responses and increases the risk of stroke.
The top pathway from Gene Set Enrichment Analysis (GSEA) of KEGG data set
In this study, employing integrative disease modeling methods, two common important risk factors for stroke (dyslipidemia and diabetes) were mechanistically modeled. GSEA showed that the biological process and pathway of the inflammatory response, response to lipid, regulation of body fluid levels, regulation of response to stress, complement, and coagulation cascades, and PPAR signaling pathway are involved in the stroke mechanism. In each of these biological pathways, there are several factors involved in which inhibition or stimulation can affect these pathways. Modeling has shown that various factors such as Insulin, ADIPOQ, PPARG, NOS3, and HDL-C have an effect on these pathways and in many threatening processes such as lipid oxidation and ROS biosynthesis process.
Inflammation is common in cardiovascular diseases (CVD) and increases the risk of CVD and diabetes . The presence of inflammation, which somehow indicates the stimulation of the immune system, is followed by damage to the nerve cells and the impact of various risk factors, such as diabetes and obesity [13, 14]. Also, some markers, such as CRP, TNFA, and IL6, are present in the mechanism of inflammation that increases in an ischemic obstruction . Dyslipidemia may also be involved in stroke. For example, reducing High-Density Lipoprotein (HDL) cholesterol leads to an increase in inflammation, which increases the risk of CVD and ischemic stroke , especially in diabetic patients . As our model shows, Reactive Oxygen Species (ROS) play an important role in the production of CRP through NfKB activation , nitric oxide reduction , and increased oxidative stress , and TNF . The model also showed that some cytokines, such as ICAM1 and VCAM1 , also contribute to inflammatory responses. According to model findings, factors such as PPARG and ADIPOQ inhibit the underlying factors of inflammation. For example, PPARG reduces CRP production and thereby inflammation by restricting cytokines or inhibiting TNF .
The analysis of cause and effect mechanisms involved in increasing the risk of stroke showed that oxidative stress plays an important role in various mechanisms of stroke, including endothelial dysfunction and lipid oxidation. As the model showed, among free radical producing factors, VEGFA increases the production of ROS by activating NADPH oxidase and obesity due to the high amount of Leptin (LEP) in obese people. On the other hand, hyperglycemia also increases the amount of superoxide (O2-) and AGE production. AGEs are products that inhibit NOS3 enzymes. According to our findings, NOS3 inhibition inhibits the production of nitric oxide, resulting in increased oxidative stress. Oxidative stress causes LDL oxidation, which itself activates inflammatory cells and increases factors such as NfKB . Following this, the increase in some inflammatory markers, such as IL6, CRP, and TNFA, and some adhesion molecules , provides the basis for inflammatory response. The role of some factors in increasing the biosynthesis of nitric oxide in the model is well demonstrated. ADIPOQ plays an important role in this process in two ways. First, stimulating the metabolic process of insulin, increases the biosynthesis of nitric oxide. Also, through the second way, increasing the expression of AMPK, and subsequently increasing the NOS3, facilitates the production of nitric oxide.
Integrative Modeling showed that using a variety of molecular and clinical data which are stored in different databases, can increase our understanding of disease mechanisms. Forasmuch as this method is based on evidence, so using this method provides a better understanding of the disease mechanism. Since the proper understanding of the role of the various factors in the mechanism of diseases is contributing to the development and proposal of new therapies, we must look for solutions to interfere with these biological pathways.
In this study, only the risk factors of dyslipidemia and diabetes were examined mechanistically, while other factors such as hypertension were not studied. Therefore, the proposed model was only for two mentioned risk factors without considering other factors.
The authors agree on this final form of the manuscript, and attested that all authors contributed in the final draft of the manuscript.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest regarding the publication of this study.
No financial interests related to the material of this manuscript have been declared.