Contents

Comparing the SVM and Logistic Regression

What’s SVM and Logistic Regression

SVM

  • Determine the binary classification result by giving the optimized decision boundary between the 2 classes.

  • Non-linear feature mapping by kernel tricks, ex: polynomial, radio base function.

  • Discriminative model, inference the output class y base on the evidence x.

  • The output of SVM is in the range (-∞, ∞), the positive sign represents the positive inference result and vice versa for the negative.

Note: SVM can also output the probability after performing the calibration.

Logistic Regression

  • Determine the binary classification result by giving the probability.

  • Non-linear feature mapping by kernel tricks for Kernel Logistic Regression, ex: polynomial, radio base function.

Note: Kernel Logistic Regression doesn’t support by scikit-learn.

  • Discriminative model, inference the output class y base on the evidence x.

  • The output of Logistic Regression is in the range [0, 1] which represents for the probability.

Note: The precise probability can be obtained by after performing the calibration.

How to Choose Between SVM and Logistic Regression

Visualize the Data Density on the Decision Boundary

  • Low data density on the decision boundary -> SVM

Note: Low data density is more like a black-or-white event so it’s reasonable to optimized on the decision boundary.

  • High data density on the decision boundary -> Logistic Regression

Note: High density shows up that there’s an ambiguity to determine whether the output should be positive or negative.

Decide a Linear or Non-Linear Kernel

  • Small feature size and large data -> non-linear kernel.

Note: Because the data is large enough, so we can introduce more complex model to get better performance.

  • Large feature size and small data -> linear kernel.

Note:

  1. Because the data size is small, so its better to choose simple model to prevent over-fitting.
  2. If the feature size is to large, it’s recommend to visualize the data distribution on the feature space to see whether the decision boundary will be smooth or not.
  • Small feature size and extra large data -> linear kernel.

Note: The quadratic programming on extra large data set would cost a lot of computing latency.

Ways of Implementation

SVM

  • Scikit-Learn

    • optimization: quadratic programing
  • Pytorch

    • Model: linear layer
    • Loss function: hinge loss

Logistic Regression

  • Scikit-Learn

    • optimization: SGD
  • Pytorch

    • Model: linear layer + sigmoid layer
    • Loss function: cross entropy

Reference

[1] SVM和logistic回归分别在什么情况下使用?