Pass small amounts of data between components

Parameters are useful for passing small amounts of data between components and when the data created by a component does not represent a machine learning artifact such as a model, dataset, or more complex data type.

Specify parameter inputs and outputs using built-in Python type annotations:

from kfp import dsl

@dsl.component
def join_words(word: str, count: int = 10) -> str:
    return ' '.join(word for _ in range(count))

KFP maps Python type annotations to the types stored in ML Metadata according to the following table:

Python objectKFP type
strstring
intnumber
floatnumber
boolboolean
typing.List / listobject
typing.Dict / dictobject

As with normal Python function, input parameters can have default values, indicated in the standard way: def func(my_string: str = 'default'):

Under the hood KFP passes all parameters to and from components by serializing them as JSON.

For all Python Components (Lightweight Python Components and Containerized Python Components), parameter serialization and deserialiation is invisible to the user; KFP handles this automatically.

For Container Components, input parameter deserialization is invisible to the user; KFP passes inputs to the component automatically. For Container Component outputs, the user code in the Container Component must handle serializing the output parameters as described in Container Components: Create component outputs.

Input parameters

Using input parameters is very easy. Simply annotate your component function with the types and, optionally, defaults. This is demonstrated by the following pipeline, which uses a Python Component, a Container Component, and a pipeline with all parameter types as inputs:

from typing import Dict, List
from kfp import dsl

@dsl.component
def python_comp(
    string: str = 'hello',
    integer: int = 1,
    floating_pt: float = 0.1,
    boolean: bool = True,
    dictionary: Dict = {'key': 'value'},
    array: List = [1, 2, 3],
):
    print(string)
    print(integer)
    print(floating_pt)
    print(boolean)
    print(dictionary)
    print(array)


@dsl.container_component
def container_comp(
    string: str = 'hello',
    integer: int = 1,
    floating_pt: float = 0.1,
    boolean: bool = True,
    dictionary: Dict = {'key': 'value'},
    array: List = [1, 2, 3],
):
    return dsl.ContainerSpec(
        image='alpine',
        command=['sh', '-c', """echo $0 $1 $2 $3 $4 $5 $6"""],
        args=[
            string,
            integer,
            floating_pt,
            boolean,
            dictionary,
            array,
        ])

@dsl.pipeline
def my_pipeline(
    string: str = 'Hey!',
    integer: int = 100,
    floating_pt: float = 0.1,
    boolean: bool = False,
    dictionary: Dict = {'key': 'value'},
    array: List = [1, 2, 3],
):
    python_comp(
        string='howdy',
        integer=integer,
        array=[4, 5, 6],
    )
    container_comp(
        string=string,
        integer=20,
        dictionary={'other key': 'other val'},
        boolean=boolean,
    )

Output parameters

For Python Components and pipelines, output parameters are indicated via return annotations:

from kfp import dsl

@dsl.component
def my_comp() -> int:
    return 1

@dsl.pipeline
def my_pipeline() -> int:
    task = my_comp()
    return task.output

For Container Components, output parameters are indicated using a dsl.OutputPath annotation:

from kfp import dsl

@dsl.container_component
def my_comp(int_path: dsl.OutputPath(int)):
    return dsl.ContainerSpec(
        image='alpine',
        command=[
            'sh', '-c', f"""mkdir -p $(dirname {int_path})\
                            && echo 1 > {int_path}"""
        ])

@dsl.pipeline
def my_pipeline() -> int:
    task = my_comp()
    return task.outputs['int_path']

See Container Components: Create component outputs for more information on how to use dsl.OutputPath

Multiple output parameters

You can specify multiple named output parameters using a typing.NamedTuple. You can access a named output using .outputs['<output-key>'] on PipelineTask:

from kfp import dsl
from typing import NamedTuple

@dsl.component
def my_comp() -> NamedTuple('outputs', a=int, b=str):
    outputs = NamedTuple('outputs', a=int, b=str)
    return outputs(1, 'hello')

@dsl.pipeline
def my_pipeline() -> NamedTuple('pipeline_outputs', c=int, d=str):
    task = my_comp()
    pipeline_outputs = NamedTuple('pipeline_outputs', c=int, d=str)
    return pipeline_outputs(task.outputs['a'], task.outputs['b'])

Feedback

Was this page helpful?


Last modified May 31, 2024: Regrouped user guides (350dde8a)