WHY MACHINE LEARNING MODELS SOMETIMES FAIL?

Paul Bananzi
3 min readDec 17, 2021

In this recent times machine learning techniques are been used to make predictions in order to help organizations make better informed decisions. Machine learning models can sometimes fail in achieving our business objectives. Sometimes you work on a project just to realize the performance of your model is not good enough and you ask yourself was it really worth going through the trouble. Below are some reasons why machine learning models may not work as expected;

Data Quality Issues

There is not enough data.

Features present in the data are not useful predictors or no enough domain knowledge.

Overlapping classes in the case classification problems.

The problem at hand is not a machine learning problem but can be solved using simple statistics.

Data Quality Issues

Data quality is the measure of how well suited a data set is to serve its specific purpose. Measures of data quality are based on data quality characteristics such as accuracy, completeness, consistency, validity, uniqueness, and timeliness. (OmniSc)

Sometimes the data we have may contain some inconsistencies such as inaccurate data. This can occur where wrong data entries are made also, some amount of the data may be missing, or have extreme values known as outliers. In addition a bias data can affect the quality of your data. This mostly occurs when you use a wrong sampling techniques and the data you have is not a representative of the problem you are trying to solve. If these problems are not solved it can affect the performance of our machine learning model.

There is not enough data.

Also, most machine learning models need more data to be able to learn patterns in data especially, deep learning models need large amounts of data to be able to train. If there is not enough data it becomes impossible for organizations to effectively use machine learning models.

Features present in the data are not useful predictors or no enough domain knowledge.

It is always assume that the features you have in your data are able to help predict your target value in the case of regression or class in the case of classification. Sometimes, the features you have may not be relevant in predicting your output hence making machine models fail in learning patterns in your data. Also, domain knowledge is needed to help extract new features if there is no domain knowledge it can contribute to the failure of machine learning models

Overlapping classes in the case classification problems.

In some cases It may be difficult to find a model that best separates negative and positive classes in a data set were the target classes then to overlap each other this makes it difficult for machine learning models to learn patterns in the data. An example can be in fraud detection were fraudulent transactions may be similar to genuine transactions and there is no feature that can easily distinguish the two. Below is an image of how separated and overlapping classes may look.

On the left we have a clean separated class whereas on the right we have overlapping classes.

The problem at hand is not a machine learning problem but can be solved using simple statistics

This one of the most important reason why machine learning models may not work in my opinion, most business problems can sometimes be solved using simple statistical models rather than machine learning models. Therefore, before you even think of applying a machine learning model, understand the nature of the problem and see whether machine learning is the way to go.

Conclusion

All the reasons stated above, why machine learning models may fail can be resolved using other techniques. But in my opinion if your machine learning model is not working as it should, it could be as a result of one or more of the above. Wish to hear your opinions on why machine learning may sometimes not work.

References

https://valiancesolutions.com/data-science/challenges-of-data-quality/

https://www.omnisci.com/technical-glossary/data-quality

Das B., Krishnan N.C., Cook D.J. (2014) Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset. In: Yada K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_12

--

--