3/22/2017

Survival of the flattest

Probably most of you are familiar with the problem of overfitting in machine learning. This problem occurs when the model learns literally everything, including the noise as well as the signal in the training data, to an extent that it negatively affects its performance. It becomes way too specialized on the training data such that it will lose the ability to generalize. It is obvious that overfitting is a problem, but would you really care about overfitting if all the data you could ever get on your problem was available to you at the very beginning? The model already describes it perfectly anyway. Of course this is not a practical statement to make, but my point is, there is a reason to give up on making the model perfectly fit to the training data : the new data that will come later. You have to settle for the trade off being imperfect and robust rather than being perfect but fragile, if you want to generalize your model for the future.