Encoding categorical data is the process of transforming categorical data into integer format so that data with converted categorical values can be fed into various models. In the field of data science, data preparation is a must before moving on to modelling.
Encoding is a method of turning categorical variables to numerical values so that a machine learning model may be easily fitted to them.
Some algorithms can deal directly with categorical data. A decision tree, for example, can be learned straight from categorical data without the need for any data transformation (this depends on the specific implementation). Many machine learning algorithms are unable to operate directly on label data.
In general, handling missing data by replacing them with the mean/median/mode is a clumsy method. Such a crude approximation is acceptable and could produce good results depending on the circumstances, such as if the variation is low or if the variable has low leverage over the response.
Share a personalized message with your friends.