In this article I will be discussing some of the core concepts used for both measuring the direction and magnitude of a gradient at a certain point in space, Jacobians and Hessians. These tools form the underlying math behind the process of optimization within machine learning and are very important to understand, so without wasting any more time lets begin.
A Jacobian can best be defined as a determinant which is defined for a finite number of functions of the same number of variables in which each row consists of the first partial derivatives of the same function with respect to each of the variables.
The concept of the Jacobian can be applied to a wide variety of problems, but in the context of machine learning there is a particular scenario in which its used most often, that being within vectors and matrices. This article plays off of articles I had written previously including multivariate calculus and vectors and matrices, If your unfamiliar about these two concepts I recommend reading those first.
The Jacobian of a single function of many variables:
can be thought of as a vector, where each entry is the partial derivative of f with respect to each one of its variables:
by convention we write this as a row vector rather than a column vector.
Say we have a function f with multiple variables like so:
Our first step would be splitting this function into partial derivatives, which would come out as so:
Then by putting these partial derivatives into an array, we get our Jacobian vector of the function f:
Now lets say we wanted to evaluate the Jacobian vector at a certain point, say at point (0,0,0). We can do this by simply throwing the points into the differential equations within the Jacobian vector by there respective variable and then calculating them. For our example, all our points are equal to 0 so by plugging 0 in wherever our function variables are, our Jacobian vector would look something like this:
and our final outcome would be the following:
More on Jacobian vector points
Lets look at one more equation shown below:
This equation is quite complex, but worry not as we wont be solving for it, instead imagine that we already solved for it and have the Jacobian vectors. Lets now plot it out to get a better representation of what Jacobians are:
With the dark regions representing low values, and the light regions representing high values, we now get the idea that the Jacobian is simply a vector that we can calculate for each location on a plot which will point to the direction of the steepest uphill slope. Furthermore the steeper the slope, the greater the magnitude of Jacobian at that point.
Now lets look at it using another plotting technique, the contour plot:
where the lines are drawn to represent areas of equal hight. If we were to re-plot out the Jacobian vector field we would get something like this:
Where they’re all pointing uphill, away from the low dark regions and toward the high bright regions. Also notice that we find our largest Jacobian vectors where the contour lines are tightly packed, and we find our smallest Jacobian vectors at the peaks of the mountains, the bottom of the valleys, or on the wide flat areas.
A Jacobian Matrix describes functions that take a vector as an input, and gives a vector as an output. If we consider the three functions below:
We can think of these functions as vector spaces, one containing vectors with coordinates u,v,w and another containing the coordinates x,y,z. Each point in x,y,z has a cooresponding location in u,v,w, and as we move around x,y,z space, we would expect our corresponding path in u,v,w to be quite different. We can make seperate Jacobian row vectors for u,v,w:
However we are considering u,v,w to be components of a single vector, so it would make more sense to extend our Jacobian by stacking these vectors as rows of a matrix like so:
We now have the structure for our Jacobian Matrix, and with this we can solve for the functions and see what we get. As we know, the first step would be to solve for the partial derivatives of the functions:
Then throw them into the Jacobian Matrix:
Once again, lets say that we now wanted to evaluate the Jacobian vector at point (0,0,0). All we need to do is once again throw the points into the differential equations within the Jacobian matrix by there respective variable and solve for them:
And our final output would look like so:
There is one last system which relates to multivariate systems, The Hessian Matrix. As we saw above for Jacobians, we collected all of the first order derivatives of a function, the Hessian is similar only that we now want to collect all of the second order derivatives of a function.
Lets use an example problem to get a better understanding of this. Say we have the function:
The first thing we need to split this into its partial derivatives like so, this would be the first differentiation:
Which now gives us our Jacobian vector:
We now differentiate again, which would be our second differentiation, only this time solving with respect to x and y using the equations within the Jacobian vector. Calculating the following will give us our Hessian Matrix in return:
More on Hessian Matrices
Hessians are of immense help in linear algebra as well as for determining points of local maxima and minima. Lets consider another function along with its Hessian Matrix:
The following eqaution would look something like this when plotted out:
Now say you hadn’t seen the function beforehand, and calculated the value of the Jacobian at the point (0,0). You’ll see that the gradient vector was also 0, but how would you know whether this was a maximum or minimum at that point? We can simply look at the Hessian, and if we were to solve for the determinant of the Hessian we would get:
Firstly we can see that the power of the Hessian is positive, this signifies that we are dealing with either a maximum or minimum, secondly by looking back to the Hessian matrix if the first term within it is positive, we know we have a minimum.
If we were to now look at the same equation, only this time including a subtraction sign:
We would get something like this when we plot it out:
This time are Hessian determinant is negative so we know we’re not dealing with a maximum or minimum, and at the original point (0,0) that we calculated earlier is now flat. What we have is a location with 0 gradient and slopes coming down towards it in one direction, and up towards it in the other direction. This is referred to as a sattle point.
In this article we looked into what Jacobians are, both at a vector and matrix level, and wrote out a few examples for better understanding. We also discussed a little about Vector points, and how the Jacobians work when plotted out. Lastly we took a quick look at Hessians and what they are. If you notice any errors in my mathematics or explanations feel free to leave a comment about it, I’d love to understand where I went wrong and fix it as soon as possible. I hope you’ve found this useful, thanks for reading!