Kubeflow - Using Kubernetes python client to mount a volume on GKE

Good evening,

I’d like to request some guidance with using the python k8s client to mount and use a directory to pass results between Kubeflow containerized components.

In my pipeline code, declaring a file_output in the first component and writing a file to it does not persist long enough for access by following components. I attempted to use the /tmp directory as a k8s volume but during execution the files written to /tmp are not persisted cross-containers.

import kfp.dsl as dsl 
import kfp.gcp as gcp
from kubernetes import client as k8s_client

@dsl.pipeline(name='kubeflow demo')
def pipeline(project_id='kubeflow-demo-254012',
    bucket='gs://kubeflow-demo-254012-kubeflow-bucket',
    collector_output='/tmp/collected_dataset.csv',
    preprocessor_output='/tmp/preprocessed_dataset.csv'):

    data_collector = dsl.ContainerOp(
        name='data collector',
        image='eu.gcr.io/kubeflow-demo-254012/data-collector',
        arguments=[
            "--project_id", project_id,
            "--bucket", bucket,
            "--collector_output", collector_output
        ],
        file_outputs={
            "output": '/tmp/collected_dataset.csv'
        }
    ).add_volume(k8s_client.V1Volume(name='tmp', host_path=k8s_client.V1HostPathVolumeSource(path='/tmp'))).add_volume_mount(k8s_client.V1VolumeMount(mount_path='/tmp', name='tmp'))

    data_preprocessor = dsl.ContainerOp(
        name='data preprocessor',
        image='eu.gcr.io/kubeflow-demo-254012/data-preprocessor',
        arguments=[
            "--project_id", project_id,
            "--bucket", bucket,
            "--collector_output", collector_output,
            "--preprocessor_output", preprocessor_output
        ]
    )
    data_preprocessor.after(data_collector)

if __name__ == '__main__':
    import kfp.compiler as compiler
    compiler.Compiler().compile(pipeline, __file__ + '.tar.gz')

I’ve tried using the Google storage bucket but that gave me all sorts of authentication issues. My hope is to be able to use the k8s volume feature instead.

Thank you.

This should be help.

GKE for Kubeflow

PersistentVolumes and PersistentVolumeClaims in Kubernetes, and their use with Google Kubernetes Engine.

2 Likes

I know this is not a part of the thread, but could you please point me to any resources where I can learn how to create a run for a deployed kubeflow pipeline using python or other languages?

1 Like