XGBoost And 2nd Order Gradient

So the formula, within the pink color sq.is the objective (loss perform) to be minimized within the [math]t[/math]-th spherical. Having its derivative in opposition to [math]f_t(x)[/math] to be zero, we get

[math]g + h \times f_t(x) = 0 ,[/math]

which is

[math]f_t(x) = -\fracgh [/math]

which implies in the [math]t[/math]-th spherical of XGBoost, we should always attempt to suit [math]-\fracgh[/math]

In other phrases, [math]-\fracgh[/math] represents the "residual" of the ([math]t-1[/math])-th spherical, and it is best than utilizing simply [math]g[/math] in sense of descending pace (thus, coaching pace).

Observe [math]g[/math] means 1st order derivative, and [math]h[/math] means 2nd order derivative from the ([math]t-1[/math])-th spherical.

Another equivalent illustration is Newton’s technique in optimization – Wikipedia the place [math]\fracfunc’(variable)function’’(variable)[/math] (the pink arrow) is a shortcut in contrast towards [math]function’(variable)[/math] (the green arrow) :

In short, it uses

[math]\frac{L’f}L’’(f)[/math]

because the fitting target, the place [math]L'(f)[/math] is 1st order derivative and [math]L''(f)[/math] is 2nd order derivative.

Recall that GradientBoost makes use of [math]L’(f)[/math] as fitting target. Apparently, XGBoost ought to be much quicker.

