How to use Katib Config
This guide describes the Katib Config — the main configuration file for every Katib component. We use Kubernetes ConfigMap to fetch that config into the Katib control plane components.
The ConfigMap must be deployed in the
KATIB_CORE_NAMESPACE
namespace with the katib-config
name.
Katib config has the initialization: init
and the runtime: runtime
parameters. You can modify
these parameters by editing the katib-config
ConfigMap:
kubectl edit configMap katib-config -n kubeflow
Initialization Parameters
Katib Config parameters set in init
represent initialization settings for
the Katib control plane. These parameters can be modified before Katib control plane is deployed.
apiVersion: config.kubeflow.org/v1beta1
kind: KatibConfig
init:
certGenerator:
enable: true
...
controller:
trialResources:
- Job.v1.batch
- TFJob.v1.kubeflow.org
...
It has settings for the following Katib components:
Katib certificate generator:
certGenerator
Katib controller:
controller
Katib Certificate Generator Parameters
The following parameters set in .init.certGenerator
configure the Katib certificate generator:
enable
- whether to enable Katib certificate generator.The default value is
false
webhookServiceName
- a service name for the Katib webhooks. If it is set, Katib certificate generator is forcefully enabled.The default value is
katib-controller
webhookSecretName
- a secret name to store Katib webhooks certificates. If it is set, Katib certificate generator is forcefully enabled.The default value is
katib-webhook-cert
Katib Controller Parameters
The following parameters set in .init.controller
configure the Katib controller:
experimentSuggestionName
- the implementation of Suggestion interface for Experiment controller.The default value is
default
metricsAddr
- a TCP address that the Katib controller should bind to for serving prometheus metrics.The default value is
8080
healthzAddr
- a TCP address that the Katib controller should bind to for health probes.The default value is
18080
injectSecurityContext
- whether to inject security context to Katib metrics collector sidecar container from Katib Trial training container.The default value is
false
trialResources
- list of resources that can be used as a Trial template. The Trial resources must be in this format: Kind.version.group (e.g.TFJob.v1.kubeflow.org
). Follow this guide to understand how to make Katib Trial work with your Kubernetes CRDs.The default value is
[Job.v1.batch]
webhookPort
- a port number for Katib admission webhooks.The default value is
8443
enableLeaderElection
- whether to enable leader election for Katib controller. If this value is true only single Katib controller Pod is active.The default value is
false
leaderElectionID
- an ID for the Katib controller leader election.The default value is
3fbc96e9.katib.kubeflow.org
Runtime Parameters
Katib Config parameters set in runtime
represent runtime settings for
the Katib Experiment. These parameters can be modified before Katib Experiment is created. When
Katib Experiment is created Katib controller fetches the latest configuration from the
katib-config
ConfigMap.
apiVersion: config.kubeflow.org/v1beta1
kind: KatibConfig
runtime:
metricsCollectors:
- kind: StdOut
image: docker.io/kubeflowkatib/file-metrics-collector:latest
...
suggestions:
- algorithmName: random
image: docker.io/kubeflowkatib/suggestion-hyperopt:latest
...
earlyStoppings:
- algorithmName: medianstop
image: docker.io/kubeflowkatib/earlystopping-medianstop:latest
...
Metrics Collectors Parameters
Parameters set in .runtime.metricsCollectors
configure container for
the Katib metrics collector.
The following settings are required for each Katib metrics collector that you want to use in your Katib Experiments:
kind
- one of the Katib metrics collector types.image
- a Docker image for the metrics collector’s container.
The following settings are optional:
imagePullPolicy
- an image pull policy for the metrics collector’s container.The default value is
IfNotPresent
resources
- resources for the metrics collector’s container.The default values for the
resources
are:metricsCollectors: - kind: StdOut image: docker.io/kubeflowkatib/file-metrics-collector:latest resources: requests: cpu: 50m memory: 10Mi ephemeral-storage: 500Mi limits: cpu: 500m memory: 100Mi ephemeral-storage: 5Gi
You can run your metrics collector’s container without requesting the
cpu
,memory
, orephemeral-storage
resource from the Kubernetes cluster. For instance, you have to removeephemeral-storage
from the container resources to use the Google Kubernetes Engine cluster autoscaler.To remove specific resources from the metrics collector’s container set the negative values in requests and limits in your Katib config as follows:
resources: requests: cpu: -1 memory: -1 ephemeral-storage: -1 limits: cpu: -1 memory: -1 ephemeral-storage: -1
waitAllProcesses
- a flag to define whether the metrics collector should wait until all processes in the Trial’s training container are finished before start to collect metrics.The default value is
false
Suggestions Parameters
Parameters set in .runtime.suggestions
configure Deployment for
the Katib Suggestions. Every Suggestion represents
one of the AutoML algorithms that you can use in Katib Experiments.
The following settings are required for Suggestion Deployment:
algorithmName
- one of the Katib algorithm names. For example:tpe
image
- a Docker image for the Suggestion Deployment’s container. Image example:docker.io/kubeflowkatib/<suggestion-name>
For each algorithm you can specify one of the following Suggestion names in the Docker image:
Suggestion name List of supported algorithms Description suggestion-hyperopt
random
,tpe
Hyperopt optimization framework suggestion-skopt
bayesianoptimization
Scikit-optimize optimization framework suggestion-goptuna
cmaes
,random
,tpe
,sobol
Goptuna optimization framework suggestion-optuna
multivariate-tpe
,tpe
,cmaes
,random
,grid
Optuna optimization framework suggestion-hyperband
hyperband
Katib Hyperband implementation suggestion-pbt
pbt
Katib PBT implementation suggestion-enas
enas
Katib ENAS implementation suggestion-darts
darts
Katib DARTS implementation
The following settings are optional:
<ContainerV1>
- you can specify all container parameters inline for your Suggestion Deployment. For example,resources
for container resources orenv
for container environment variables.Configuration for
resources
works the same as for Katib metrics collector’s containerresources
.serviceAccountName
- a ServiceAccount for the Suggestion Deployment.By default, the Suggestion Pod doesn’t have any specific ServiceAccount, in which case, the Pod uses the default service account.
Note: If you want to run your Experiments with early stopping, the Suggestion’s Deployment must have permission to update the Experiment’s Trial status. If you don’t specify a ServiceAccount in the Katib config, Katib controller creates required Kubernetes Role-based access control for the Suggestion.
If you need your own ServiceAccount for the Experiment’s Suggestion with early stopping, you have to follow the rules:
The ServiceAccount name can’t be equal to
<experiment-name>-<experiment-algorithm>
The ServiceAccount must have sufficient permissions to update the Experiment’s Trial status.
Suggestion Volume Parameters
When you create an Experiment with
FromVolume
resume policy,
you are able to specify
PersistentVolume (PV)
and
PersistentVolumeClaim (PVC)
settings for the Experiment’s Suggestion to restore stage of the AutoML algorithm.
If PV settings are empty, Katib controller creates only PVC. If you want to use the default volume specification, you can omit these parameters.
For example, Suggestion volume config for random
algorithm:
suggestions:
- algorithmName: random
image: docker.io/kubeflowkatib/suggestion-hyperopt:latest
volumeMountPath: /opt/suggestion/data
persistentVolumeClaimSpec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 3Gi
storageClassName: katib-suggestion
persistentVolumeSpec:
accessModes:
- ReadWriteMany
capacity:
storage: 3Gi
hostPath:
path: /tmp/suggestion/unique/path
storageClassName: katib-suggestion
persistentVolumeLabels:
type: local
volumeMountPath
- a mount path for the Suggestion Deployment’s container.The default value is
/opt/katib/data
persistentVolumeClaimSpec
- a PVC specification for the Suggestion Deployment’s PVC.The default value is:
persistentVolumeClaimSpec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
persistentVolumeSpec
- a PV specification for the Suggestion Deployment’s PV.Suggestion Deployment’s PV always has
persistentVolumeReclaimPolicy: Delete
to properly remove all resources once Katib Experiment is deleted. To know more about PV reclaim policies check the Kubernetes documentation.persistentVolumeLabels
- PV labels for the Suggestion Deployment’s PV.
Early Stoppings Parameters
Parameters set in runtime.earlyStoppings
configure container for
the Katib Early Stopping algorithms.
The following settings are required for each early stopping algorithm that you want
to use in your Katib Experiments:
algorithmName
- one of the early stopping algorithm names (e.g.medianstop
).image
- a Docker image for the early stopping container.
The following settings are optional:
imagePullPolicy
- an image pull policy for the early stopping’s container.The default value is
IfNotPresent
resources
- resources for the early stopping’s container.Configuration for
resources
works the same as for Katib metrics collector’s containerresources
.
Next steps
- How to set up environment variables for various Katib component.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.