ML Model Deployment Strategies
5 min read
Deploying any ML Model to production involves certain challenges which include:
Concept Drift & Data Drift: Concept Drift is basically when the relationship between the training variable & the target output changes whereas data drift is the change in the distribution of the data over time. Both can lead to a decline in the model's performance.
Software Engineering Issues: When we are deploying an ML Model there are certain factors that we need to think about, for example, the latency & the throughput needed, whether to have real-time or batch predictions. How to log the results for monitoring & maintaining the security & privacy of data.
When we train Machine Learning using a specific algorithm, the best way to deploy the model in production depends on a number of factors:
The acceptable downtime of our Machine Learning Solution.
The operation cost & the human involvement in deploying the model.
The ease with which we can roll back the model in case of a drift.
Whether there is a need to test with production traffic or not.
Now that we have understood what are the different challenges in deploying a Model, let's take a look into the different deployment strategies.
In this strategy, we scale down the prior model before scaling up the new model version. Because it takes time to scale down the current model and scale up the new model version, the recreate technique is slow and causes downtime for the ML solution. Because we just have one version of the model, this strategy is incredibly straightforward to use. Recreate Deployment is not a scalable method and is best suited for small-scale applications.
We should use the Recreate Deployment when we can afford downtime with the product or when we don't want the new deployment to be backward compatible.
Example: In Machine Learning applications where we run the predictions in the form of batches.
Shadow Deployment technique is used when we already have an ML model running in production. We used this technique to run the new model alongside the existing one in production. The forecast from the previous model is returned to the application, while the response data from the new model is saved for testing and comparing the outcomes. We require sufficient monitoring to access performance and must operate more servers for the new prediction service.
We should use Shadow Deployment when we want to test the new version across actual production data & at the same time don't disrupt the existing users.
Example: In Machine Learning applications where we want to forecast the business performance or growth, we can use shadow deployment to compare the predicted value from the model & actual growth.
Gradual Ramp-Up with Monitoring
The next 2 deployment strategies involve releasing the model for a certain % of users & then based on the performance monitoring, making it available to 100% of users.
In Canary Deployment, we have the old & new versions both running in production & serving the application. The major difference between canary & shadow deployment is that in Shadow, the response data from the new model is used for performance monitoring whereas here it is used to serve the application. The new model version is made available to a minimum set of users and then exposed to the entire set.
We should use Canary Deployment when we want to test the new version across actual production data & at the same time evaluate the existing user's response to the model. It allows us to spot problems early on before there are maybe overly large consequences to the application with no downtime.
Example: In Machine Learning applications that serve as recommendation systems like content or product. We can compare the interactions of different users with different models applied & then determine which was effective in providing recommendations.
A/B Testing Deployment
As the name suggests in A/B Testing Deployment, we have many different versions of the model. We divide the users into different groups based on the number of models we have & then decide the best model based on the performance & the user's interaction. With A/B testing we can discard the low-performing models fast with no downtime.
We can use A/B testing deployment when we have a couple of models which provide almost similar results. With this technique, we can determine the best model using production data & response.
Example: Similar to Canary Deployment, A/B Testing can also be used for recommendation systems like content or product recommendation.
The blue-green deployment is accomplished by utilizing an existing prediction service. Then, as the staging environment, we build a new prediction service, the green version. Once the performance and functionality testing in the green environment is completed, we have the router switch traffic from the old to the new. It incurs additional costs due to the upkeep of various settings. The benefit of a blue-green deployment is that it enables simple rollback. If something goes wrong, we may simply reset the router or switch to the blue version to divert traffic.
We can use Blue-Green deployment when the application can afford no downtime and backward compatibility is required.
Example: Real-time prediction system like fraud/anomaly detection.
Below is a brief table of differences based on the explanation of each method and the four main considerations we mentioned above for picking a deployment strategy. I hope this article helps you select the right deployment strategy for your Machine Learning applications.
|Leads to Downtime||Yes||No||No||No||No|
|Possibility of Rollback||Yes but with downtime||No need for a rollback||Yes, fast||Yes, fast||Yes, very fast|
|Testing with production traffic||No||Yes||Yes||Yes||No|
|Extra costs of deployment||No||Yes, for testing the new model with production data.||No||No||Yes, need to maintain two separate environments.|
Did you find this article valuable?
Support Shloka Shah by becoming a sponsor. Any amount is appreciated!