The Hadoop persistence backend is set to CephFS, but the required directories cannot be created during initialization

Cluster information:

Kubernetes version:
1.31.3
Installation method:
Host OS: Ubuntu server 24.04
CNI and version: calico/node:v3.29.0
CRI and version: containerd 1.7.24

The cluster uses Helm to deploy Rook and Hive, where Rook utilizes the official rook-ceph and rook-ceph-cluster chart packages (both version 1.15). Hive uses a chart package that integrates Hadoop and Hive, and the image is custom-built, integrating Hadoop, Hive, Spark, and Flink.

Before using Rook, the persistence solution was unified using NFS, and the Hive data warehouse was running normally, indicating that both the chart packages and the images themselves should be fine.

After switching from NFS to Ceph, three storage classes were automatically created, with Hive’s persistence now using CephFS, occupying 4 PVCs.

kubectl get sc
NAME                   PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-block (default)   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   6d13h
ceph-bucket            rook-ceph.ceph.rook.io/bucket   Delete          Immediate           false                  6d13h
ceph-filesystem        rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   6d13h
kubectl get pvc -A|grep filesystem
hive                           dfs-hive-hadoop-hdfs-nn-0                               Bound    pvc-d9acbc38-942f-4ce3-ad39-b324a0bd7330   10Gi       RWO            ceph-filesystem   <unset>                 2d13h
hive                           dfs1-hive-hadoop-hdfs-dn-0                              Bound    pvc-873f043d-3adc-46d3-984b-247c3aedd799   100Gi      RWO            ceph-filesystem   <unset>                 2d13h
hive                           dfs2-hive-hadoop-hdfs-dn-0                              Bound    pvc-f0296e11-d4a7-4eea-88ae-70d0c9c196bb   100Gi      RWO            ceph-filesystem   <unset>                 2d13h
hive                           dfs3-hive-hadoop-hdfs-dn-0                              Bound    pvc-ceabd71c-dfe1-413f-aa66-f75defdf73f5   100Gi      RWO            ceph-filesystem   <unset>                 2d13h

Some pods use Ceph RBD for persistence, and no issues have been found. In order to provide the Hive JDBC driver JAR file to Superset, a custom pod was created to upload the JAR file to CephFS. After deleting the pod used for the upload, the file still exists, and Superset continues to function normally.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ceph-filesystem
  volumeMode: Filesystem
apiVersion: v1
kind: Pod
metadata:
  name: cephfs-upload-pod
spec:
  containers:
    - name: cephfs-container
      image: busybox
      command: ["sleep", "3600"]
      volumeMounts:
        - mountPath: /mnt/cephfs
          name: cephfs-storage
  volumes:
    - name: cephfs-storage
      persistentVolumeClaim:
        claimName: cephfs-pvc

Hadoop initialization of NameNode failed with an error, unable to create the required directories.

Exiting with status 1: java.io.IOException: Cannot create directory /opt/apache/hadoop-3.4.1/data/hdfs/namenode/current
Exiting with status 1: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /opt/apache/hadoop-3.4.1/data/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.

Because the pod containing the NameNode keeps crashing and cannot be accessed, I accessed the pod running the ResourceManager (which comes from the same image) and ran hadoop namenode -format, which successfully initialized. The difference between the two pods is that the ResourceManager pod does not use persistence, so it is not using CephFS.

hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ echo $HADOOP_HOME
/opt/apache/hadoop


hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ ls -l /opt/apache/|grep /opt/apache/hadoop
lrwxrwxrwx 1 hadoop hadoop   24 Dec 11 22:50 hadoop -> /opt/apache/hadoop-3.4.1


hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ ls -l /opt/apache/hadoop-3.4.1/data/hdfs/
total 8
drwxr-xr-x 5 hadoop hadoop 4096 Dec 11 22:50 datanode
drwxr-xr-x 2 hadoop hadoop 4096 Dec 11 22:50 namenode


hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ ls -l /opt/apache/hadoop-3.4.1/data/hdfs/namenode/
total 0


hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ hadoop namenode -format
WARNING: Use of this script to execute namenode is deprecated.
WARNING: Attempting to execute replacement "hdfs namenode" instead.

2024-12-14 12:47:55,001 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hive-hadoop-yarn-rm-0.hive-hadoop-yarn-rm.hive.svc.cluster.local/10.244.189.104
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.4.1
STARTUP_MSG:   classpath = ...... more
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1; compiled by 'mthakur' on 2024-10-09T14:57Z
STARTUP_MSG:   java = 1.8.0_421
************************************************************/
2024-12-14 12:47:55,013 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2024-12-14 12:47:55,369 INFO namenode.NameNode: createNameNode [-format]
2024-12-14 12:47:57,737 INFO common.Util: Assuming 'file' scheme for path /opt/apache/hadoop/data/hdfs/namenode in configuration.
2024-12-14 12:47:57,740 INFO common.Util: Assuming 'file' scheme for path /opt/apache/hadoop/data/hdfs/namenode in configuration.

These are ConfigMaps of the Hive chart package:

  core-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://{{ include "hadoop.fullname" . }}-hdfs-nn.{{ .Release.Namespace }}:9000/</value>
          <description>NameNode URI</description>
      </property>
      <property>
          <name>hadoop.proxyuser.root.hosts</name>
          <value>*</value>
      </property>
      <property>
          <name>hadoop.proxyuser.root.groups</name>
          <value>*</value>
      </property>
      <property>
          <name>hadoop.proxyuser.hadoop.hosts</name>
          <value>*</value>
      </property>
      <property>
          <name>hadoop.proxyuser.hadoop.groups</name>
          <value>*</value>
      </property>
    </configuration>
  hdfs-site.xml: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>

{{- if .Values.hdfs.webhdfs.enabled -}}
      <property>
          <name>dfs.webhdfs.enabled</name>
          <value>true</value>
      </property>
{{- end -}}

      <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>false</value>
      </property>

      <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>false</value>
      </property>

      <!--
      <property>
        <name>dfs.datanode.hostname</name>
        <value>{{ .Values.hdfs.dataNode.externalHostname }}</value>
      </property>
      -->

      <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
      </property>

      <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9864</value>
      </property>

      <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:9866</value>
      </property>

      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>

      <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/apache/hadoop/data/hdfs/datanode/data1,/opt/apache/hadoop/data/hdfs/datanode/data2,/opt/apache/hadoop/data/hdfs/datanode/data3</value>
        <description>DataNode directory</description>
      </property>

      <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/apache/hadoop/data/hdfs/namenode</value>
        <description>NameNode directory for namespace and transaction logs storage.</description>
      </property>

      <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
      </property>

      <!-- Bind to all interfaces -->
      <property>
        <name>dfs.namenode.rpc-bind-host</name>
        <value>0.0.0.0</value>
      </property>
      <property>
        <name>dfs.namenode.servicerpc-bind-host</name>
        <value>0.0.0.0</value>
      </property>
      <!-- /Bind to all interfaces -->

    </configuration>

This is the values.yaml file of the Hive chart package. Originally, when using NFS, the two StorageClasses were set to nfs-storage. Now they have been changed to ceph-filesystem, with no other modifications.

persistence:
  nameNode:
    enabled: true
    enabledStorageClass: true
    storageClass: ceph-filesystem
    accessMode: ReadWriteOnce
    size: 10Gi
    volumes:
    - name: dfs
      mountPath: /opt/apache/hadoop/data/hdfs/namenode
      persistentVolumeClaim:
        claimName: dfs-hadoop-hadoop-hdfs-nn

  dataNode:
    enabled: true
    enabledStorageClass: true
    storageClass: ceph-filesystem
    accessMode: ReadWriteOnce
    size: 100Gi
    volumes:
    - name: dfs1
      mountPath: /opt/apache/hdfs/datanode1
      persistentVolumeClaim:
        claimName: dfs1-hadoop-hadoop-hdfs-dn
    - name: dfs2
      mountPath: /opt/apache/hdfs/datanode2
      persistentVolumeClaim:
        claimName: dfs2-hadoop-hadoop-hdfs-dn
    - name: dfs3
      mountPath: /opt/apache/hdfs/datanode3
      persistentVolumeClaim:
        claimName: dfs3-hadoop-hadoop-hdfs-dn

Thank you for your attention to this issue!