Contents

Comparing the SVM and Logistic Regression

Yoshi Gao included in ML short articles

2023-01-18 375 words 2 minutes

/note_compare_svm_and_lr/featured-image.png

Contents

What’s SVM and Logistic Regression

SVM

Determine the binary classification result by giving the optimized decision boundary between the 2 classes.
Non-linear feature mapping by kernel tricks, ex: polynomial, radio base function.
Discriminative model, inference the output class y base on the evidence x.
The output of SVM is in the range (-∞, ∞), the positive sign represents the positive inference result and vice versa for the negative.

Note: SVM can also output the probability after performing the calibration.

Logistic Regression

Determine the binary classification result by giving the probability.
Non-linear feature mapping by kernel tricks for Kernel Logistic Regression, ex: polynomial, radio base function.

Note: Kernel Logistic Regression doesn’t support by scikit-learn.

Discriminative model, inference the output class y base on the evidence x.
The output of Logistic Regression is in the range [0, 1] which represents for the probability.

Note: The precise probability can be obtained by after performing the calibration.

How to Choose Between SVM and Logistic Regression

Visualize the Data Density on the Decision Boundary

Low data density on the decision boundary -> SVM

Note: Low data density is more like a black-or-white event so it’s reasonable to optimized on the decision boundary.

High data density on the decision boundary -> Logistic Regression

Note: High density shows up that there’s an ambiguity to determine whether the output should be positive or negative.

Decide a Linear or Non-Linear Kernel

Small feature size and large data -> non-linear kernel.

Note: Because the data is large enough, so we can introduce more complex model to get better performance.

Large feature size and small data -> linear kernel.

Note:
Because the data size is small, so its better to choose simple model to prevent over-fitting.
If the feature size is to large, it’s recommend to visualize the data distribution on the feature space to see whether the decision boundary will be smooth or not.

Small feature size and extra large data -> linear kernel.

Note: The quadratic programming on extra large data set would cost a lot of computing latency.

Ways of Implementation

SVM

Scikit-Learn
- optimization: quadratic programing
Pytorch
- Model: linear layer
- Loss function: hinge loss

Logistic Regression

Scikit-Learn
- optimization: SGD
Pytorch
- Model: linear layer + sigmoid layer
- Loss function: cross entropy

Reference

[1] SVM和logistic回归分别在什么情况下使用？