The foundation of random forest is the assessment of forecasts generated by several decision trees.
· Combining the Bagging (Breiman, 1996) and Random Subspace (Ho, 1998) approaches results in it.
· The bootstrap random sample selection method is used to choose the observations for trees, while the random subspace method is used to choose the variables.
· The best branching (information gain) variable is selected from among the lesser number of variables randomly chosen from among all the variables at each node of the decision tree.
· 2/3 of the data set is used in tree building. Excluded data is used to evaluate the performance of trees and to determine variable significance.
· At each node, a random variable is chosen.
Random expression is used because the random selection process is made from the observation units and less number of variables are selected from the variables before starting the division process in the trees created.
Bagging: This is a homogenous weak learner’s model that combines the individual parallel learnings for the purpose of calculating the model average.
Boosting: It functions differently from Bagging despite being a homogeneous weak learners’ model. In this strategy, learners acquire knowledge progressively and adaptively to enhance learning algorithm predictions.
What is the difference between bagging and boosting methods?
In the bagging method, trees do not have dependencies on each other. In the boosting method, trees are built on the leftovers. Therefore, trees have interdependencies.
- Bagging helps to decrease the model’s variance.
- Boosting helps to decrease the model’s bias.