Abstract
Applying logistic regression (LR) when the number of features exceeds the number of instances is one of the great challenges that attracted the researchers' attention. This paper proposes a sequential fitting method (SFM) to address the overfitting problem of logistic regression. The proposed method is based on the fact that logistic regression features should be uncorrelated, and the number of features must be relatively less than the number of instances. Typically, only a few of these features are significant in building the model. In addition, the paper provides a comprehensive comparison of logistic regression (LR), naïve Bayes (NB), and random Forest (RF) in terms of the number of training data, number of features, and balanced or unbalanced data sets. Machine learning metrics such as accuracy, specificity, sensitivity, and area under the Roc curve are used to evaluate the algorithm’s performance. The results of the three classifiers on these metrics have been validated and compared using some statistical analysis including the area under the ROC curve, and Wilcoxon signed-rank tests. The study concluded that the proposed method (SFM) is successful in applying logistic regression with overfitting data sets, and the proposed method can compete with Naïve Bayes and Random Forest.
| Original language | English |
|---|---|
| Pages (from-to) | 224-238 |
| Number of pages | 15 |
| Journal | International Journal of Advances in Soft Computing and its Applications |
| Volume | 15 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2023 |
Keywords
- Logistic Regression (LR)
- Machine Learning
- Naïve Bayes (NB)
- Random Forest (RF)
- Sequential Fitting Method (SFM)
Fingerprint
Dive into the research topics of 'SFM: A Sequential Fitting Method to Address the Overfitting Problem of Logistic Regression'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver