Universities Students' Knowledge About Big Data Analysis
, , , andAbstract
Introduction:
Big data analysis has raised controversies today and attracted many students and academics for its dramatic advantages. The present research aims to investigate the extent to which students in different universities of Mashhad are familiar with this type of analysis.
Material and Methods:
The present cross-sectional research was conducted on university students of different fields of study in Mashhad, Iran. A questionnaire was developed based on a review of the related literature in PubMed, Google Scholar, Science Direct and EMBAS. The target questionnaire explored students' knowledge of big data analysis. To this aim, 142 students participated in this research and completed the target questionnaire. Their responses were analyzed descriptively.
Results:
The majority age of participants ranged between 21 and 28 years. 59% of these participants were female; 27% had less than a year of work experience; the academic grade of the majority of participants was Master's or Ph.D. 42% enjoyed a desirable knowledge of big data analysis. The largest number of hours of scientific and non-scientific studies belonged to basic science students and more specifically that of pharmacology.
Conclusion:
Despite the significance and benefits of big data analysis, students' unfamiliarity with the essentiality of these analyses in industries and research is considerable. It seems that the field or grade of studies has no effect on one's knowledge of big data analysis. Probably, the design of specialized educational courses with this concern can help to promote individuals' knowledge of big data analysis.
INTRODUCTION
Today, the world is faced with loads of data produced known as big data which has turned into a global controversy especially among academics. It has attracted many researchers too. Such data are marked by such features as large volume, variety, scalability, fast production and so on. Due to the same features, they are hard to manage and analyze via ordinary software and hardware [1, 2]. The types of analysis often used for such voluminous data are known as big data analysis. They enjoy certain benefits such as the discovery of useful data-driven patterns, extraction of key features, information abstraction and reduced cost [3]. These analyses are of much application in different industries, banking, medicine, transportation, insurance and so on [4-8]. Besides the great many benefits of these analyses, there are certain challenges too. If disregarded, the consequences might be multifarious. These can include deficient expertise, unfamiliarity with the required instruments and methods, type of data, security issues and budgeting [9, 10]. Big data analysis is of a great significance in different industries. Besides, university students and their research in practice relate industry and applied research. Moreover, this domain of research is still at its preliminary stages in Iran and its relevant concepts are not yet well understood. Thus, the present research aims to investigate the extent to which university students from different fields of study are familiar with big data analysis.
MATERIAL AND METHODS
The present cross-sectional research was conducted on 142 students at Ferdowsi University of Mashhad as well as the University of Medical Sciences. The aim was to explore the extent to which students from different fields of study were familiar with the target concept. Mashhad, as the main metropolis in the east of Iran has a population of 3 million people. It lies at the border of Afghanistan and Turkmenistan located on the well-known Silk Road. There are two major state universities in Mashhad, Ferdowsi University and University of Medical Sciences. The former hosts students from different fields of study including engineering and basic sciences. The latter hosts students from medical fields of study such as medicine and biology. To evaluate students' knowledge and awareness of big data analysis in different fields of study in Mashhad universities, a questionnaire was developed.
The target questionnaire was closed-ended. The primary version of the questionnaire was derived from a review of related content in Google Scholar, Science Direct and EMBASE via a Delphi method and the help of a panel of 10 experts from different fields (medical informatics, biostatistics, TIH and computer sciences).
This questionnaire contains 5 items concerned with one's knowledge of how to analyze big data. The relevant items can be seen in Table 1:
Table 1
Questionnaire content
The reliability and validity of the questionnaire was confirmed as a panel of 10 experts confirmed the validity and Cronbach's alpha was estimated to test reliability and was estimated at.73.Then, the questionnaires were submitted to 150 students. The present research attempted to include students of different fields of study. These included the following within medical sciences: medicine and dentistry, biotechnology, toxicology, nano-medicine, biotechnology, nutrition, medical imaging, radiology, microbiology, physiology, genetics, medical informatics, biochemistry, immunology, HIT, Molecular/cellular sciences and medical physics. As for engineering, the fields of study included mechanical engineering, natural resources, aquatic sciences, industries, aerospace, metallurgy, computer and civil engineering. Concerning basic sciences, the majors included mathematics, physics and chemistry. The required data were collected and it was made sure that all questionnaires were completed. From among the initial 150 questionnaires, 142 were completed and returned. Data entry and analysis were done using SPSS21 and Excell-2007.
RESULTS
To conduct the present study, 150 university students participated from different fields of study and grades.
As it can be observed in Table 2, the majority of students belonged to the 21-24 and 25-58 age groups. They were mostly female and only a minority had work experience. Most of the participants studied medical sciences, professional Ph.D. and Ph.D. of Pure basic sciences. The majority were M.S. or professional Ph.D. candidates and their previous academic field of study was experimental sciences.
Considering the probability of correct answers by chance, any single participant’s score was categorized in two. Those answering fewer than 3 items correctly were categorized as of low knowledge, and those answering 3 or more items correctly were taken as having desirable knowledge. Accordingly, 82 participants (58%) found to belong to the former and 60 (42%) with the latter category.
The mean hours of participants’ scientific and non-scientific hours of studies are shown in Table 3.
As it can be observed, basic sciences and pharmacological basic sciences enjoyed the longest hours of scientific studies while pharmacological basic sciences and dentistry enjoyed the longest hours of nonscientific studies.
Table 2
Research participants’ demographic information (n=142)
Table 3
Mean and standard deviation of participants’ hours of scientific and non-scientific studies across fields of study
Participants’ knowledge was as the following (Table 4). The frequency of correct answers to questions (n=5).
Table 4
Research participants’ level of knowledge
# | f. | % |
---|---|---|
0 | 6 | 4.2 |
1 | 37 | 26.1 |
2 | 39 | 27.5 |
3 | 34 | 23.9 |
4 | 22 | 15.5 |
5 | 4 | 2.8 |
Total | 142 | 100 |
Table 5
Distribution of participants’ age and level of knowledge (frequency and percentage in each age group)
As it can be observed in Table 5, the age range 25-44 enjoyed the highest score and the 18-24 and 45-54 received the lowest score.
Table 6
Distribution of participants’ sex and level of knowledge (frequency and percentage in each sex)
Sex | Male | Female | Total | ||
---|---|---|---|---|---|
score | poor |
F (%) |
35 (60.3) |
47 (56.0) |
82 (57.7) |
good |
F (%) |
23 (39.7) |
37 (44.0) |
60 (42.3) |
|
Total |
F (%) |
58 (100) |
84 (100) |
142 (100) |
As it can be observed in Table 6, the majority of participants were female, yet the level of knowledge diverged among them from low to high.
Table 7
Distribution of participants’ background experience and level of knowledge (frequency and percentage in each category)
Experience | <1 year | >1 year | Total | ||
---|---|---|---|---|---|
Score | poor | F (%) | 59 )57.3( | 23 )59.0( | 82 )57.7( |
good | F (%) | 44 )42.7( | 16 )41.0( | 60 (42.3( | |
Total | F (%) | 103 (100.0( | 39 )100.0( | 142 )100.0( |
Table 8
Distribution of participants’ field of study and knowledge score (frequency and percentage in each category)
As it shown in Table 7, the majority of participants had less than a year experience, they had both poor and good scores.
According to the Table 8, good scores belong to medical sciences and professional Ph.D. and poor scores were those of dentistry and pharmacology professional Ph.D.
As the results showed (Table 9), good knowledge scores belonged to M.S. and professional Ph.D. grades while low scores were obtained by B.S. and Specialized Ph.D.
As indicated in Table 10, good scores were obtained by students whose previous field of study was either medical basic sciences or none whereas the poor scores were those of dentistry and pharmacology.
According to the results presented in Table 11, the hours of scientific and non-scientific studies in both good and low categories were about equal. Yet, it can be concluded that those with longer hours of scientific and non-scientific studies enjoyed a good knowledge score.
Table 9
Distribution of participants’ grade of education and knowledge score (frequency and percentage in each category)
Table 10
Distribution of participants’ previous field of study and knowledge (frequency and percentage in each category)
Table 11
Participants’ mean and standard deviation of scientific and non-scientific hours of studies and level of knowledge
DISCUSSION
It is essential to explore university students’ knowledge of big data analysis. In the present research, students’ knowledge was explored across different fields of study in Mashhad. The majority of students belonged to the 21-58 age group; they were female; a minority had work experience. Most of the participants studied medical sciences, professional Ph.D. and pure basic sciences.
Those belonging to the 25-44 year age group received the best scores and those in the 18-24 and 45-54 year age group received the lowest scores. It seems that youth is the best time for knowledge acquisition.
CONCLUSION
Considering the fact that the present research participants were affiliated with the best reputed universities in Iran, they were expected to have a better knowledge of big data analysis. Moreover, the lacking correlation between hours of scientific or non-scientific studies and knowledge of big data can show the fact that well-planned education is severely lacking with this respect. Thus, preliminary education on this is essential in many fields of study. Holding conferences and seminars can also be effective.
ACKNOWLEDGEMENTS
The present study is the result of research project approved by the vice chancellery for research of Mashhad University of Medical Sciences (grant number 961731).
AUTHOR’S CONTRIBUTION
The authors agree on this final form of the manuscript, and attested that all authors contributed in the final draft of the manuscript.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest regarding the publication of this study.
FINANCIAL DISCLOSURE
No financial interests related to the material of this manuscript have been declared.