Lightweight Python Components

Create a component from a self-contained Python function

The easiest way to get started authoring components is by creating a Lightweight Python Component. We saw an example of a Lightweight Python Component with say_hello in the Hello World pipeline example. Here is another Lightweight Python Component that adds two integers together:

from kfp import dsl

@dsl.component
def add(a: int, b: int) -> int:
    return a + b

Lightweight Python Components are constructed by decorating Python functions with the @dsl.component decorator. The @dsl.component decorator transforms your function into a KFP component that can be executed as a remote function by a KFP conformant-backend, either independently or as a single step in a larger pipeline.

Python function requirements

To decorate a function with the @dsl.component decorator it must meet two requirements:

  1. Type annotations: The function inputs and outputs must have valid KFP type annotations.

    There are two categories of inputs and outputs in KFP: parameters and artifacts. There are specific types of parameters and artifacts within each category. Every input and output will have a specific type indicated by its type annotation.

    In the preceding add component, both inputs a and b are parameters typed int. There is one output, also typed int.

    Valid parameter annotations include Python’s built-in int, float, str, bool, typing.Dict, and typing.List. Artifact annotations are discussed in detail in Data Types: Artifacts.

  2. Hermetic: The Python function may not reference any symbols defined outside of its body.

    For example, if you wish to use a constant, the constant must be defined inside the function:

    @dsl.component
    def double(a: int) -> int:
        """Succeeds at runtime."""
        VALID_CONSTANT = 2
        return VALID_CONSTANT * a
    

    By comparison, the following is invalid and will fail at runtime:

    # non-example!
    INVALID_CONSTANT = 2
    
    @dsl.component
    def errored_double(a: int) -> int:
        """Fails at runtime."""
        return INVALID_CONSTANT * a
    

    Imports must also be included in the function body:

    @dsl.component
    def print_env():
        import os
        print(os.environ)
    

    For many realistic components, hermeticism can be a fairly constraining requirement. Containerized Python Components is a more flexible authoring approach that drops this requirement.

dsl.component decorator arguments

In the above examples, we used the @dsl.component decorator with only one argument: the Python function. The decorator accepts some additional arguments.

packages_to_install

Most realistic Lightweight Python Components will depend on other Python libraries. You can pass a list of requirements to packages_to_install and the component will install these packages at runtime before executing the component function.

This is similar to including requirements in a requirements.txt file.

@dsl.component(packages_to_install=['numpy==1.21.6'])
def sin(val: float = 3.14) -> float:
    return np.sin(val).item()

Note: As a production software best practice, prefer using Containerized Python Components when your component specifies packages_to_install to eliminate installation of your dependencies at runtime.

pip_index_urls

pip_index_urls exposes the ability to pip install packages_to_install from package indices other than the default PyPI.org.

When you set pip_index_urls, KFP passes these indices to pip install’s --index-url and --extra-index-url options. It also sets each index as a --trusted-host.

Take the following component:

@dsl.component(packages_to_install=['custom-ml-package==0.0.1', 'numpy==1.21.6'],
               pip_index_urls=['http://myprivaterepo.com/simple', 'http://pypi.org/simple'],
)
def comp():
    from custom_ml_package import model_trainer
    import numpy as np
    ...

These arguments approximately translate to the following pip install command:

pip install custom-ml-package==0.0.1 numpy==1.21.6 kfp==2 --index-url http://myprivaterepo.com/simple --trusted-host http://myprivaterepo.com/simple --extra-index-url http://pypi.org/simple --trusted-host http://pypi.org/simple

Note that when you set pip_index_urls, KFP does not include 'https://pypi.org/simple' automatically. If you wish to pip install packages from a private repository and the default public repository, you should include both the private and default URLs as shown in the preceding component comp.

base_image

When you create a Lightweight Python Component, your Python function code is extracted by the KFP SDK to be executed inside a container at pipeline runtime. By default, the container image used is python:3.7. You can override this image by providing an argument to base_image. This can be useful if your code requires a specific Python version or other dependencies not included in the default image.

@dsl.component(base_image='python:3.8')
def print_py_version():
    import sys
    print(sys.version)

install_kfp_package

install_kfp_package can be used together with pip_index_urls to provide granular control over installation of the kfp package at component runtime.

By default, Python Components install kfp at runtime. This is required to define symbols used by your component (such as artifact annotations) and to access additional KFP library code required to execute your component remotely. If install_kfp_package is False, kfp will not be installed via the normal automatic mechanism. Instead, you can use packages_to_install and pip_index_urls to install a different version of kfp, possibly from a non-default pip index URL.

Note that setting install_kfp_package to False is rarely necessary and is discouraged for the majority of use cases.

Feedback

Was this page helpful?


Last modified May 31, 2024: Regrouped user guides (350dde8a)