Of course, with unsupervised learning, overfitting is a real issue. It's more commonly referred to as "automatic cluster number determination" or "model selection." As a result, cross-validation isn't appropriate in this situation.
Overfitting: Excellent performance on training data, but poor generalisation to new data. Underfitting refers to poor performance on training data as well as poor generalisation to new data.
When our machine learning model is unable to capture the underlying trend of the data, we call this underfitting. To prevent the model from overfitting, the feeding of training data can be halted at an early stage, otherwise the model may not learn enough from the training data.
Overfitting is a statistical modelling error that arises when a function is too tightly fitted to a small number of data points. As a result, the model is only usable in relation to the data set it was created with, and not in relation to any other data sets.
When a model is too simplistic — informed by too few features or overly regularised — it becomes inflexible in learning from the dataset, resulting in underfitting. Simple learners' predictions have less volatility, but they are more biassed toward incorrect outcomes.
Share a personalized message with your friends.