Decide on real-time or batch deployment
When you deploy a model to an endpoint to integrate with an application, you can choose to design it for real-time or batch predictions.
The type of predictions you need depends on how you want to use the model’s predictions
To decide whether to design a real-time or batch deployment solution, you need to consider the following questions:
- How often should predictions be generated?
- How soon are the results needed?
- Should predictions be generated individually or in batches?
- How much compute power is needed to execute the model?
Identify the necessary frequency of scoring
A common scenario is that you’re using a model to score new data. Before you can get predictions in real-time or in batch, you must first collect the new data.
There are various ways to generate or collect data. New data can also be collected at different time intervals.
For example, you can collect temperature data from an Internet of Things (IoT) device every minute. You can get transactional data every time a customer buys a product from your web shop. Or you can extract financial data from a database every three months.
Generally, there are two types of use cases:
- You need the model to score the new data as soon as it comes in.
- You can schedule or trigger the model to score the new data that you’ve collected over time.
Whether you want real-time or batch predictions doesn’t necessarily depend on how often new data is collected. Instead, it depends on how often and how quickly you need the predictions to be generated.
If you need the model’s predictions immediately when new data is collected, you need real-time predictions. If the model’s predictions are only consumed at certain times, you need batch predictions.
Decide on the number of predictions
Another important question to ask yourself is whether you need the predictions to be generated individually or in batches.
A simple way to illustrate the difference between individual and batch predictions is to imagine a table. Suppose you have a table of customer data where each row represents a customer. For each customer, you have some demographic data and behavioral data, such as how many products they’ve purchased from your web shop and when their last purchase was.
Based on this data, you can predict customer churn: whether a customer will buy from your web shop again or not.
Once you’ve trained the model, you can decide if you want to generate predictions:
- Individually: The model receives a single row of data and returns whether or not that individual customer will buy again.
- Batch: The model receives multiple rows of data in one table and returns whether or not each customer will buy again. The results are collated in a table that contains all predictions.
You can also generate individual or batch predictions when working with files. For example, when working with a computer vision model you may need to score a single image individually, or a collection of images in one batch.