TY - JOUR
T1 - SVM and Naïve Bayes Stacking Approach for Improving Gene Expression Data Classification Using Logistic Regression.
AU - Musa, Abdallah Bashir
AU - Mohammed, Mohanad
AU - Mussallum, Fuad Abedalrazeq
AU - Elbashir, Murtada Khalafallah
N1 - Publisher Copyright:
Copyright © Al-Zaytoonah University of Jordan (ZUJ)
PY - 2021/3
Y1 - 2021/3
N2 - Logistic regression is the foremost statistical classification technique which has many uses in numerous disciplines including machine learning, bioinformatics, and medical research. However, logistic regression classification accuracy is hindered by large data sets. When the number of features exceeds the number of instances, e.g. in the classification of gene expression data, improving logistic regression accuracy has been an important challenge that draws the researchers’ attention. Ensemble learning techniques are designed to create a meta-classifier by combining several classifiers that are built on the same data to enhance the machine learning algorithm performance. In this paper, stacking approach is used to improve the accuracy of logistic regression for the classification of gene expression data. The stacking approach is a method in which one meta-classifier learns the output of the combined base classifiers. For this purpose, support vector machines with linear and radial basis function, and naïve Bayes are used as base classifiers while logistic regression is used as a meta-classifier. The dimension reduction technique is used for raising the degree of classification accuracy of logistic regression. Principle component analysis (PCA) is used for reducing the dimension of the data before applying the stacking approach method. Several machine learning metrics are used for assessing the method: accuracy, sensitivity, specificity, the area under the curve (AUC), kappa and ROC analysis. The study has demonstrated that applying stacking approach with logistic regression results in improving its accuracy and make it applicable to classify the gene expression data.
AB - Logistic regression is the foremost statistical classification technique which has many uses in numerous disciplines including machine learning, bioinformatics, and medical research. However, logistic regression classification accuracy is hindered by large data sets. When the number of features exceeds the number of instances, e.g. in the classification of gene expression data, improving logistic regression accuracy has been an important challenge that draws the researchers’ attention. Ensemble learning techniques are designed to create a meta-classifier by combining several classifiers that are built on the same data to enhance the machine learning algorithm performance. In this paper, stacking approach is used to improve the accuracy of logistic regression for the classification of gene expression data. The stacking approach is a method in which one meta-classifier learns the output of the combined base classifiers. For this purpose, support vector machines with linear and radial basis function, and naïve Bayes are used as base classifiers while logistic regression is used as a meta-classifier. The dimension reduction technique is used for raising the degree of classification accuracy of logistic regression. Principle component analysis (PCA) is used for reducing the dimension of the data before applying the stacking approach method. Several machine learning metrics are used for assessing the method: accuracy, sensitivity, specificity, the area under the curve (AUC), kappa and ROC analysis. The study has demonstrated that applying stacking approach with logistic regression results in improving its accuracy and make it applicable to classify the gene expression data.
KW - gene expression data
KW - Logistic regression (LR)
KW - naïve Bayes
KW - Principle component analysis (PCA)
KW - stacking approach
KW - support vector machines (SVM)
UR - https://www.scopus.com/pages/publications/85103522837
M3 - Article
AN - SCOPUS:85103522837
SN - 2074-8523
VL - 13
SP - 136
EP - 148
JO - International Journal of Advances in Soft Computing and its Applications
JF - International Journal of Advances in Soft Computing and its Applications
IS - 1
ER -