Run your first InferenceService
The InferenceService custom resource is the primary interface that is used for deploying models on KServe. Inside an InferenceService, users can specify multiple components that are used for handling inference requests. These components are the predictor, transformer, and explainer. Learn more here.
In this tutorial, you will deploy an InferenceService with a predictor that will load a scikit-learn model trained with the iris dataset. This dataset has three output class: Iris Setosa, Iris Versicolour, and Iris Virginica.
You will then send an inference request to your deployed model in order to get a prediction for the class of iris plant your request corresponds to.
Before you begin
First, install the KServe SDK using the following command. If you run this command in a Jupyter notebook, restart the kernel after installing the SDK.
$ pip install kserve==0.7.0
Import kubernetes.client
and kserve
packages
from kubernetes import client
from kserve import KServeClient
from kserve import constants
from kserve import utils
from kserve import V1beta1InferenceService
from kserve import V1beta1InferenceServiceSpec
from kserve import V1beta1PredictorSpec
from kserve import V1beta1SKLearnSpec
Declare Namespace
This will retrieve the current namespace of your Kubernetes context. The InferenceService will be deployed in this namespace.
namespace = utils.get_default_target_namespace()
Define InferenceService
Next, define the InferenceService based on several key parameters. In the predictor
parameter, a V1beta1PredictorSpec
object with an embedded V1beta1SKLearnSpec
object is created.
Inside the V1beta1SKLearnSpec
object, a storage URI is provided, pointing to the location of the trained iris model in cloud storage.
name='sklearn-iris'
kserve_version='v1beta1'
api_version = constants.KSERVE_GROUP + '/' + kserve_version
isvc = V1beta1InferenceService(api_version=api_version,
kind=constants.KSERVE_KIND,
metadata=client.V1ObjectMeta(
name=name, namespace=namespace, annotations={'sidecar.istio.io/inject':'false'}),
spec=V1beta1InferenceServiceSpec(
predictor=V1beta1PredictorSpec(
sklearn=(V1beta1SKLearnSpec(
storage_uri="gs://kfserving-samples/models/sklearn/iris"))))
)
Create InferenceService
Now, with the InferenceService defined, you can now create it by calling the create
method of the KServeClient
.
KServe = KServeClient()
KServe.create(isvc)
Check the InferenceService
Run the following command to watch the InferenceService until it is ready (or times out).
KServe.get(name, namespace=namespace, watch=True, timeout_seconds=120)
Perform Inference
Next, you can try sending an inference request to the deployed model in order to get predictions. This notebook assumes that you running it in your Kubeflow cluster and will use the internal URL of the InferenceService.
The Python requests
library will be used to send a POST request containing your payload.
import requests
isvc_resp = KServe.get(name, namespace=namespace)
isvc_url = isvc_resp['status']['address']['url']
print(isvc_url)
inference_input = {
'instances': [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
response = requests.post(isvc_url, json=inference_input)
print(response.text)
You should see two predictions returned (i.e. {"predictions": [1, 1]}
). Both sets of data points sent for inference correspond to the flower with index 1
.
In this case, the model predicts that both flowers are “Iris Versicolour”.
To learn more about sending inference requests, please check out the KServe guide.
Run Performance Test (Optional)
If you want to load test the deployed model, try deploying the Kubernetes Job to drive load to the InferenceService.
$ kubectl create -f https://raw.githubusercontent.com/kserve/kserve/release-0.7/docs/samples/v1beta1/sklearn/v1/perf.yaml -n kubeflow-user-example-com
Get Job Name
$ kubectl get pods --namespace=kubeflow-user-example-com | grep load
Check the Job Logs
$ kubectl logs <job-name> -n kubeflow-user-example-com
The output should look like similar to the following:
Requests [total, rate, throughput] 30000, 500.02, 499.99
Duration [total, attack, wait] 1m0s, 59.998s, 3.336ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
Bytes In [total, mean] 690000, 23.00
Bytes Out [total, mean] 2460000, 82.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
Delete InferenceService
When you are done with your InferenceService, you can delete it by running the following.
KServe.delete(name, namespace=namespace)
Next Steps
Kubeflow Pipelines E2E MNIST Tutorial - provides an end-to-end test sequence (i.e. start a notebook, run a pipeline, execute training, hyperparameter tuning, and model serving with KServe).
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.