KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It aims to solve production model serving use cases by providing high abstraction interfaces for Tensorflow, XGBoost, ScikitLearn, PyTorch, Huggingface Transformer/LLM models using standardized data plane protocols.
Control Plane
Responsible for reconciling the InferenceService
custom resources. It creates the Knative serverless deployment for predictor, transformer, explainer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.
When raw deployment mode is enabled, control plane creates Kubernetes deployment, service, ingress, HPA.
Control Plane Components
KServe Controller: Responsible for creating service, ingress resources, model server container and model agent container for request/response logging, batching and model pulling.
Ingress Gateway: Gateway for routing external or internal requests.