A step-by-step derivation of the popular XGBoost algorithm including a detailed numerical illustration
XGBoost (short for eXtreme Gradient Boosting) is an open-source library that provides an optimized and scalable implementation of gradient boosted decision trees. It incorporates various software and hardware optimization techniques that allow it to deal with huge amounts of data.
Originally developed as a research project by Tianqi Chen and Carlos Guestrin in 2016 [1], XGBoost has become the go-to solution for solving supervised learning tasks on structured (tabular) data. It provides state-of-the-art results on many standard regression and classification tasks, and many Kaggle competition winners have used XGBoost as part of their winning solutions.
Although significant progress has been made using deep neural networks for tabular data, they are still outperformed by XGBoost and other tree-based models on many standard benchmarks [2, 3]. In addition, XGBoost requires much less tuning than deep models.
The main innovations of XGBoost with respect to other gradient boosting algorithms include:
- Clever regularization of the decision trees.
- Using second-order approximation to optimize the objective (Newton boosting).
- A weighted quantile sketch procedure for efficient computation.
- A novel tree learning algorithm for handling sparse data.
- Support for parallel and distributed processing of the data.
- Cache-aware block structure for out-of-core tree learning.
In this series of articles we will cover XGBoost in depth, including the mathematical details of the algorithm, implementation of the algorithm in Python from scratch, an overview of the XGBoost library and how to use it in practice.
In this first article of the series, we are going to derive the XGBoost algorithm step-by-step, provide an implementation of the algorithm in pseudocode, and then illustrate its working on a toy data set.
The description of the algorithm given in this article is based on XGBoost’s original paper [1] and the…