A college showed me this train- and validation loss on a model he was training:
I met a weird problem with model training. At some point the training loss suddenly dropped and val_loss increased a lot correspondingly.
So far so good, but then
This is not over-fitting in my opinion, because typical over-fitting shows a much more smooth increase in val_loss. Any idea on the possible causes? The learning rate is constant.
Yes, I have some advice
You might think this is not over-fitting because it does not fit the smooth textbook pattern, but it is
You are over-fitting
Although the validation loss is not increasing smoothly, like in this picture, it fits the definition of over-fitting.
During the optimization process, the model has specialized on the training set so much that the validation loss started to increase. That’s all there is to over-fitting.
Your training procedure is flawed
The problem is in your training procedure; it is not aggressive enough.
Apparently there are valleys in the loss landscape that are not explored early on. You miss out on faster convergence and (possibly) on a deeper minimum.
Fix your training strategy first
A one-cycle policy or multi-cycle policy is generally a good idea.
Probably you only need a fraction of the number of steps that you spend now with constant learning rate.
The phase with high learning rate gives your optimizer the chance to traverse the loss landscape and jump out of local minima quickly.
Another advantage of cyclic learning is that you will likely get stuck on a plateau that is not only deeper, but also wider and more stable than your current local minimum.
The phase with low learning rate allows the optimizer to cool down and dig further into any plateau that you find.
After fixing your training strategy, you will likely still end up with a model that over-fits.
Now you will see the usual pattern of over-fitting, without jumps.
Use the regular strategies to fight over-fitting: more data, augmentations, regularization