To understand the partial derivatives we need to understand derivatives which is a case of partial derivatives
Derivatives is a function that take single parameters or single variable X.
Partial derivatives could have many parameters and variables and this is what we need to understand ultimately because in a neural network which have multiple inputs. Each input gets multiplied by the corresponding weight (a function of 2 parameters), and they get summed with the bias (a function of as many parameters as there are inputs, plus one for a bias).
Let’s start with a simple function and discover what is meant by “impact.”
A very simple function y=2x , which takes x as an input:
to calculate the slope we basically measure the change in x and the change in y
The first points will be :
Can we measure the slope of this curve? Depending on which 2 points we choose to use, we will measure varying slopes:
The Numerical Derivative:
calculating the slope of a tangent line made from two points that were “sufficiently close”
Why we want to use 2 points that are sufficiently close — very small delta accuracy.?
We can see that the closer these two points are to each other, the more correct the tangent line appears to be.
graph will look like this now
Now we will make it clearer with more than one derivatives
To compute the derivative of functions using the analytical method, we can split them into simple, elemental functions, finding the derivatives of those and then applying the chain rule, complex functions can be broken down into simpler parts and calculated using the so-called chain rule
Some solutions and rules :
1- The derivative of a constant equals 0 (m is a constant in this case, as it’s not a parameter that we are deriving with respect to, which is x in this example):
2- The derivative of x equals 1:
3- The derivative of a linear function equals its slope:
4- The derivative of a constant multiple of the function equals the constant multiple of the function’s derivative:
5- The derivative of a sum of functions equals the sum of their derivatives:
The same concept applies to subtraction:
6- The derivative of an exponentiation:
In neural networks we have many layers and every the output of a layer is the input of the next layer and so on, so we need to calculate the effect of a single input on the final output even if the effect is zero we need to know it, and this is known as partial derivative
I’ll show this example from the lecture of Andrew Ng
We need to know these impacts; this means that we have to calculate the derivative with respect to each input separately to learn about each of them. That’s why we call these partial derivatives with respect to given input — we are calculating a partial of the derivative, related to a singular input. The partial derivative is a single equation, and the full multivariate function’s derivative consists of a set of equations called the gradient . In other words, the gradient is a vector of the size of inputs containing partial derivative solutions with respect to each of the inputs.
So here are some rules of the derivative
1- The partial derivative of the sum with respect to any input equals 1:
2- The partial derivative of the multiplication operation with 2 inputs, with respect to any input, equals the other input:
3- The partial derivative of the max function of 2 variables with respect to any of them is 1 if this variable is the biggest and 0 otherwise. An example of x:
4- The derivative of the max function of a single variable and 0 equals 1 if the variable is greater than 0 and 0 otherwise:
5- The derivative of chained functions equals the product of the partial derivatives of the subsequent functions:
6- The same applies to the partial derivatives. For example:
7- The gradient is a vector of all possible partial derivatives. An example of a triple-input function:
neural network from scratch (nnfs.io)