Apr 22, 2022
Let's assume that you have a categorical column with have these rows;
As you can see, almost every rows have different values. At this point, you should consider grouping values by similarities. If you aware that A, B, C, D are similar and E, F, G, H are similar; you can change your rows like this:
The less categorical values in a model's output or column, the more consistent your model will be. When creating a model, there is a "One Hot Max Size" parameter. That's for categorical features, it sets distinct values less than or equal to the given parameter value. For example, if there are 7 unique categorical values in your prepared data; you can set this parameter as 7.
Making experiments by changing parameters and column values definitely lead better results for your Machine Learning models.