Skip to main navigation Skip to search Skip to main content

SFM: A Sequential Fitting Method to Address the Overfitting Problem of Logistic Regression

Research output: Contribution to journalArticlepeer-review

Abstract

Applying logistic regression (LR) when the number of features exceeds the number of instances is one of the great challenges that attracted the researchers' attention. This paper proposes a sequential fitting method (SFM) to address the overfitting problem of logistic regression. The proposed method is based on the fact that logistic regression features should be uncorrelated, and the number of features must be relatively less than the number of instances. Typically, only a few of these features are significant in building the model. In addition, the paper provides a comprehensive comparison of logistic regression (LR), naïve Bayes (NB), and random Forest (RF) in terms of the number of training data, number of features, and balanced or unbalanced data sets. Machine learning metrics such as accuracy, specificity, sensitivity, and area under the Roc curve are used to evaluate the algorithm’s performance. The results of the three classifiers on these metrics have been validated and compared using some statistical analysis including the area under the ROC curve, and Wilcoxon signed-rank tests. The study concluded that the proposed method (SFM) is successful in applying logistic regression with overfitting data sets, and the proposed method can compete with Naïve Bayes and Random Forest.

Original languageEnglish
Pages (from-to)224-238
Number of pages15
JournalInternational Journal of Advances in Soft Computing and its Applications
Volume15
Issue number3
DOIs
StatePublished - 2023

Keywords

  • Logistic Regression (LR)
  • Machine Learning
  • Naïve Bayes (NB)
  • Random Forest (RF)
  • Sequential Fitting Method (SFM)

Fingerprint

Dive into the research topics of 'SFM: A Sequential Fitting Method to Address the Overfitting Problem of Logistic Regression'. Together they form a unique fingerprint.

Cite this