Good evening,
I’d like to request some guidance with using the python k8s client to mount and use a directory to pass results between Kubeflow containerized components.
In my pipeline code, declaring a file_output in the first component and writing a file to it does not persist long enough for access by following components. I attempted to use the /tmp directory as a k8s volume but during execution the files written to /tmp are not persisted cross-containers.
import kfp.dsl as dsl
import kfp.gcp as gcp
from kubernetes import client as k8s_client
@dsl.pipeline(name='kubeflow demo')
def pipeline(project_id='kubeflow-demo-254012',
bucket='gs://kubeflow-demo-254012-kubeflow-bucket',
collector_output='/tmp/collected_dataset.csv',
preprocessor_output='/tmp/preprocessed_dataset.csv'):
data_collector = dsl.ContainerOp(
name='data collector',
image='eu.gcr.io/kubeflow-demo-254012/data-collector',
arguments=[
"--project_id", project_id,
"--bucket", bucket,
"--collector_output", collector_output
],
file_outputs={
"output": '/tmp/collected_dataset.csv'
}
).add_volume(k8s_client.V1Volume(name='tmp', host_path=k8s_client.V1HostPathVolumeSource(path='/tmp'))).add_volume_mount(k8s_client.V1VolumeMount(mount_path='/tmp', name='tmp'))
data_preprocessor = dsl.ContainerOp(
name='data preprocessor',
image='eu.gcr.io/kubeflow-demo-254012/data-preprocessor',
arguments=[
"--project_id", project_id,
"--bucket", bucket,
"--collector_output", collector_output,
"--preprocessor_output", preprocessor_output
]
)
data_preprocessor.after(data_collector)
if __name__ == '__main__':
import kfp.compiler as compiler
compiler.Compiler().compile(pipeline, __file__ + '.tar.gz')
I’ve tried using the Google storage bucket but that gave me all sorts of authentication issues. My hope is to be able to use the k8s volume feature instead.
Thank you.