Lightweight Python Components
The easiest way to get started authoring components is by creating a Lightweight Python Component. We saw an example of a Lightweight Python Component with say_hello
in the Hello World pipeline example. Here is another Lightweight Python Component that adds two integers together:
from kfp import dsl
@dsl.component
def add(a: int, b: int) -> int:
return a + b
Lightweight Python Components are constructed by decorating Python functions with the @dsl.component
decorator. The @dsl.component
decorator transforms your function into a KFP component that can be executed as a remote function by a KFP conformant-backend, either independently or as a single step in a larger pipeline.
Python function requirements
To decorate a function with the @dsl.component
decorator it must meet two requirements:
Type annotations: The function inputs and outputs must have valid KFP type annotations.
There are two categories of inputs and outputs in KFP: parameters and artifacts. There are specific types of parameters and artifacts within each category. Every input and output will have a specific type indicated by its type annotation.
In the preceding
add
component, both inputsa
andb
are parameters typedint
. There is one output, also typedint
.Valid parameter annotations include Python’s built-in
int
,float
,str
,bool
,typing.Dict
, andtyping.List
. Artifact annotations are discussed in detail in Data Types: Artifacts.Hermetic: The Python function may not reference any symbols defined outside of its body.
For example, if you wish to use a constant, the constant must be defined inside the function:
@dsl.component def double(a: int) -> int: """Succeeds at runtime.""" VALID_CONSTANT = 2 return VALID_CONSTANT * a
By comparison, the following is invalid and will fail at runtime:
# non-example! INVALID_CONSTANT = 2 @dsl.component def errored_double(a: int) -> int: """Fails at runtime.""" return INVALID_CONSTANT * a
Imports must also be included in the function body:
@dsl.component def print_env(): import os print(os.environ)
For many realistic components, hermeticism can be a fairly constraining requirement. Containerized Python Components is a more flexible authoring approach that drops this requirement.
dsl.component decorator arguments
In the above examples, we used the @dsl.component
decorator with only one argument: the Python function. The decorator accepts some additional arguments.
packages_to_install
Most realistic Lightweight Python Components will depend on other Python libraries. You can pass a list of requirements to packages_to_install
and the component will install these packages at runtime before executing the component function.
This is similar to including requirements in a requirements.txt
file.
@dsl.component(packages_to_install=['numpy==1.21.6'])
def sin(val: float = 3.14) -> float:
return np.sin(val).item()
Note: As a production software best practice, prefer using Containerized Python Components when your component specifies packages_to_install
to eliminate installation of your dependencies at runtime.
pip_index_urls
pip_index_urls
exposes the ability to pip install packages_to_install
from package indices other than the default PyPI.org.
When you set pip_index_urls
, KFP passes these indices to pip install
’s --index-url
and --extra-index-url
options. It also sets each index as a --trusted-host
.
Take the following component:
@dsl.component(packages_to_install=['custom-ml-package==0.0.1', 'numpy==1.21.6'],
pip_index_urls=['http://myprivaterepo.com/simple', 'http://pypi.org/simple'],
)
def comp():
from custom_ml_package import model_trainer
import numpy as np
...
These arguments approximately translate to the following pip install
command:
pip install custom-ml-package==0.0.1 numpy==1.21.6 kfp==2 --index-url http://myprivaterepo.com/simple --trusted-host http://myprivaterepo.com/simple --extra-index-url http://pypi.org/simple --trusted-host http://pypi.org/simple
Note that when you set pip_index_urls
, KFP does not include 'https://pypi.org/simple'
automatically. If you wish to pip install packages from a private repository and the default public repository, you should include both the private and default URLs as shown in the preceding component comp
.
base_image
When you create a Lightweight Python Component, your Python function code is extracted by the KFP SDK to be executed inside a container at pipeline runtime. By default, the container image used is python:3.7
. You can override this image by providing an argument to base_image
. This can be useful if your code requires a specific Python version or other dependencies not included in the default image.
@dsl.component(base_image='python:3.8')
def print_py_version():
import sys
print(sys.version)
install_kfp_package
install_kfp_package
can be used together with pip_index_urls
to provide granular control over installation of the kfp
package at component runtime.
By default, Python Components install kfp
at runtime. This is required to define symbols used by your component (such as artifact annotations) and to access additional KFP library code required to execute your component remotely. If install_kfp_package
is False
, kfp
will not be installed via the normal automatic mechanism. Instead, you can use packages_to_install
and pip_index_urls
to install a different version of kfp
, possibly from a non-default pip index URL.
Note that setting install_kfp_package
to False
is rarely necessary and is discouraged for the majority of use cases.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.