Kubeflow - Using Kubernetes python client to mount a volume on GKE

Ashraf_G · September 29, 2019, 7:50pm

Good evening,

I’d like to request some guidance with using the python k8s client to mount and use a directory to pass results between Kubeflow containerized components.

In my pipeline code, declaring a file_output in the first component and writing a file to it does not persist long enough for access by following components. I attempted to use the /tmp directory as a k8s volume but during execution the files written to /tmp are not persisted cross-containers.

import kfp.dsl as dsl 
import kfp.gcp as gcp
from kubernetes import client as k8s_client

@dsl.pipeline(name='kubeflow demo')
def pipeline(project_id='kubeflow-demo-254012',
    bucket='gs://kubeflow-demo-254012-kubeflow-bucket',
    collector_output='/tmp/collected_dataset.csv',
    preprocessor_output='/tmp/preprocessed_dataset.csv'):

    data_collector = dsl.ContainerOp(
        name='data collector',
        image='eu.gcr.io/kubeflow-demo-254012/data-collector',
        arguments=[
            "--project_id", project_id,
            "--bucket", bucket,
            "--collector_output", collector_output
        ],
        file_outputs={
            "output": '/tmp/collected_dataset.csv'
        }
    ).add_volume(k8s_client.V1Volume(name='tmp', host_path=k8s_client.V1HostPathVolumeSource(path='/tmp'))).add_volume_mount(k8s_client.V1VolumeMount(mount_path='/tmp', name='tmp'))

    data_preprocessor = dsl.ContainerOp(
        name='data preprocessor',
        image='eu.gcr.io/kubeflow-demo-254012/data-preprocessor',
        arguments=[
            "--project_id", project_id,
            "--bucket", bucket,
            "--collector_output", collector_output,
            "--preprocessor_output", preprocessor_output
        ]
    )
    data_preprocessor.after(data_collector)

if __name__ == '__main__':
    import kfp.compiler as compiler
    compiler.Compiler().compile(pipeline, __file__ + '.tar.gz')

I’ve tried using the Google storage bucket but that gave me all sorts of authentication issues. My hope is to be able to use the k8s volume feature instead.

Thank you.

miker256 · September 29, 2019, 8:09pm

This should be help.

GKE for Kubeflow

PersistentVolumes and PersistentVolumeClaims in Kubernetes, and their use with Google Kubernetes Engine.

Ashraf_G · October 7, 2019, 10:18am

I know this is not a part of the thread, but could you please point me to any resources where I can learn how to create a run for a deployed kubeflow pipeline using python or other languages?

Topic		Replies	Views
Watch events using kubernetes python client and write into the file behaves strangely General Discussions	2	5566	November 1, 2020
GKE javascript API General Discussions development	1	481	May 17, 2021
On premises k8s PV General Discussions development , network	1	478	November 22, 2023
We have published a series of Kubernetes tutorials. Need feedback on these tutorials General Discussions	1	1333	November 20, 2018
Using kfp to submit pipelines fails microk8s	7	2635	September 29, 2020

Kubeflow - Using Kubernetes python client to mount a volume on GKE

Related topics