Overfitting
When a model learns training data too perfectly, including noise and quirks, causing poor performance on new unseen data.
Memorizing Instead of Learning
Overfitting is like a student who memorizes test answers without understanding the underlying concepts. They ace practice tests but bomb the real exam because they can't generalize. An overfit model does the same thing - it captures every detail of training data, including irrelevant noise, and fails when confronted with new examples.
You can spot overfitting when training accuracy keeps improving but validation accuracy plateaus or gets worse. The model is getting better at the training data while losing ability to generalize.
Why It Happens and How to Fight It
Complex models with many parameters can essentially memorize training data. If you have more parameters than data points, the model can perfectly fit every example without learning any real patterns. This is especially risky with small datasets.
There are many defenses. Regularization penalizes model complexity. Dropout randomly disables neurons during training, preventing over-reliance on specific features. Early stopping halts training before the model starts memorizing. More training data helps because there's more signal to learn and the noise averages out.
The fundamental insight is that fitting training data perfectly isn't the goal. We want models that capture real patterns that hold true beyond the training set. Sometimes a simpler model that makes a few errors on training data will actually perform better on new data.