3 Types of ML Ensemble Design Pattern

2023-01-02 441 words 3 minutes

/note_ml_design_pattern_ensemble/featured-image.png

Contents

What’s Ensemble?

Combine the set of weak models instead of using a model with larger capacity(complexity) alone.

When to Consider the Ensemble Model?

For the case we have enough amount of the data, we can just apply the modern machine learning architecture such as neural network. But how if we just have the small or medium scale of the data? If we apply neural network directly, we can expect to get low bias performance but large variance will become to the issue.

The overfitting(large variance) is caused by the over complexity of the model we used. Here’s why ensemble comes from, by using the weak models with lower complexity our ensemble model could have less sensitive to the input and has low variance. Combining the weak models gives the ensemble model to have less bias rather than individual weak model.

In conclusion, when we want to train the model on small or medium scale data and encounter the bias-variance trade off problem, ensemble might be the way to tak into the consideration.

Types of the Ensembles

Homogeneous

Combine the set of same models

Bagging: Training the individual weak model on the sample dataset by Boostrap, finally average or voting in same weight to decide the output by all weak models.

Note: Bagging takes the advantage of the ensemble by averaging the error caused by input noise, this also explains why bagging has less effect on stable models like KNN, SVM…

Boosting: Training the individual weak model on weighted sample data, in which the weighted amount of each data is relative to the error made on the previous model training process. Finally, the individual weak model would combine together with the weight which shows the extent of the importance of the output made by this individual weak model.

Bagging vs Boosting: Boosting has more complexity than bagging because the inference information is transformed as the ensemble weights.

Heterogeneous

Stacking: We can divide the stacking model into layer 1 and layer 2. Layer 1 is a set of different models like Random Forest, XGBoost, …, which have great functionality on the inference of certain features of the input data. Layer 2 is meta model responsible for making use of the output of layer 1 models as input features, and finding the best combination to give the output prediction.

Design Choice Between Bagging, Boosting and Stacking

Lower variance: Bagging
Lower bias: Boosting
Take advantage of the feature recognition ability from different model: Stack

Trade-Offs and Alternatives

Increase the training and design time
Decrease the model interoperability
Can alternative by dropout although they are not the exactly same concept

Reference

[1] Machine Learning Design Patterns