SVM and Naïve Bayes Stacking Approach for Improving Gene Expression Data Classification Using Logistic Regression.

  • Abdallah Bashir Musa*
  • , Mohanad Mohammed*
  • , Fuad Abedalrazeq Mussallum*
  • , Murtada Khalafallah Elbashir*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Logistic regression is the foremost statistical classification technique which has many uses in numerous disciplines including machine learning, bioinformatics, and medical research. However, logistic regression classification accuracy is hindered by large data sets. When the number of features exceeds the number of instances, e.g. in the classification of gene expression data, improving logistic regression accuracy has been an important challenge that draws the researchers’ attention. Ensemble learning techniques are designed to create a meta-classifier by combining several classifiers that are built on the same data to enhance the machine learning algorithm performance. In this paper, stacking approach is used to improve the accuracy of logistic regression for the classification of gene expression data. The stacking approach is a method in which one meta-classifier learns the output of the combined base classifiers. For this purpose, support vector machines with linear and radial basis function, and naïve Bayes are used as base classifiers while logistic regression is used as a meta-classifier. The dimension reduction technique is used for raising the degree of classification accuracy of logistic regression. Principle component analysis (PCA) is used for reducing the dimension of the data before applying the stacking approach method. Several machine learning metrics are used for assessing the method: accuracy, sensitivity, specificity, the area under the curve (AUC), kappa and ROC analysis. The study has demonstrated that applying stacking approach with logistic regression results in improving its accuracy and make it applicable to classify the gene expression data.

Original languageEnglish
Pages (from-to)136-148
Number of pages13
JournalInternational Journal of Advances in Soft Computing and its Applications
Volume13
Issue number1
StatePublished - Mar 2021

Keywords

  • gene expression data
  • Logistic regression (LR)
  • naïve Bayes
  • Principle component analysis (PCA)
  • stacking approach
  • support vector machines (SVM)

Fingerprint

Dive into the research topics of 'SVM and Naïve Bayes Stacking Approach for Improving Gene Expression Data Classification Using Logistic Regression.'. Together they form a unique fingerprint.

Cite this