In our previous article – 5 Challenges to be prepared for while scaling ML models, we discussed the top five challenges in productionizing scalable Machine Learning (ML) models. Our focus for this piece is to establish the best practices that make an ML project successful.
ML models today solve a wide variety of specific business challenges across industries. The method of choosing an ML model largely depends on the business use case that we are trying to solve. But before proceeding any further, we should ensure that the chosen approach to build the models are produtionizable.
Due to the complexities, the right risks have to be eliminated early off in the production process. Eliminating a higher number of risks at earlier Stages of the model selection & development leads to lesser rework during the productionizing stage.
The various considerations involved in a machine learning ecosystem are — data sets, a technology stack, implementation and integrating these two, and teams who deploy the ML models. Then come the resilient testing framework to ensure consistent business results.
Using the best practices given below Yum! Brands were able to achieve an 8% sales uptick by productionizing their MAB models for personalised email marketing. Watch the 2 min video where Yum’s Scott Kasper explains the impact of the best practices in productionizing their MAB models
1. Data Assessment
To start, data feasibility should be checked — Do we even have the right data sets to run machine learning models on top? Do we get data fast enough to do predictions?
For example, restaurant chains(QSRs) with access to millions registered customers’ data. This sheer volume is enough for any ML model to run on top of it.
When the above data risks are mitigated, a data lake environment with easy and powerful access to a variety of required data sources should be set up. A data lake (in place of traditional warehouses) would save the team a lot of bureaucratic and manual overhead.
Experimentation with the data sets to ensure that the data has enough information to bring about the desired business change is crucial at this step. Also, a scalable computing environment to process the available data in a fast manner is a primary requirement.
When the data scientists have cleaned up, structured, and processed the different data sets, we strongly advise cataloging the data for leveraging in the future.
In the end, a strong and well-thought governance and security system should be put in place so that different teams in the organization can share the data freely.