Using Hindi Language in Configmap

I’m using hindi language in configmap, which is utf-8 supported and when i use it in configmap then after mounting in the container, i see some random characters between it, although after printing it works fine, but what can be the issue if it is utf-8 supported and the random characters in non-english language ?

Cluster information:

Kubernetes version: v1.22.2
Cloud being used: metlallb
Host OS: ubuntu20

Can you clarify what “see some random characters” means (see how? What tool did you use to read the file) and what “after printing it” means?

We should have tests that cover this, but on the off chance that we don’t, we shoul dadd them. You may do better to open a github issue.

After mounting configmap - प�~Mरिय �~W�~Mराह�~U, �~Fपन�~G �~Eपन
In configmap - प्रिय ग्राहक, आपने अपने
and when i’m printing the mounted file characters i’m getting the proper format but some characters are missing from original like आपने as अपने and many more, and i’m using golang to open the mounted json file.

Apologies for not attaching the proper problem, kindly check and if this is a proper issue i will open it on github.

Sorry, I still need info. The ConfigMap API is described as:

Data map[string]string
BinaryData map[string][]byte

A string in Go is a series of bytes. I think we assume that any string-encoded field holds valid unicode data, but we don’t enforce that, as far as I can tell. When we render such a thing to JSON, it should be encoded so as to produce valid JSON

A []byte is an explicit denotation that we will not try to interpret the values at all, and when we encode it as JSON, it is not assumed to be a valid string.

That’s what the object ACTUALLY HOLDS. How you “see” that data matters because different tools have different levels of support for unicode and different error handling.

Can you show me how you are mounting the configmap, just so I can make sure I don’t try something different from what you are doing?

When you say “In configmap”, that does not tell me how you SEE the data. Is it kubectl get -o json or -o yaml or kubectl edit ?

This is FAR from my area of expertise, but it seems like something that we SHOULD get right but also that could easily slip thru the cracks of a primarily-in-English project. Also, as a non-speaker, it’s hard for me to see when we get it wrong somehow :slight_smile:

apiVersion: v1
data:
  Data.json: |
    {
      "English":" You have",
      "Hindi":"आपने अपने",
      "Marathi":"तुम्ही तुमच्या"
      }
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: temp
    meta.helm.sh/release-namespace: random
  creationTimestamp: "2024-01-23T06:40:21Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: temp-config
  namespace: random
  resourceVersion: "1121"
  uid: d123123

This is the configmap api which i’m using and i’m checking this configmap using kubectl edit, and with kubectl get -o json, the data is same
deployment.yaml

 volumeMounts:
        - mountPath: /tmp 
          name: temp-config
      volumes:
      - name: temp-config
        configMap:
          name: temp-config

Reading data.json in golang and then printing values of it.

Here’s what I see:

$ k get -o json cm foo
{
    "apiVersion": "v1",
    "data": {
        "hindi": "प्रिय ग्राहक, आपने अपने",
        "mykey": "myvalue"
    },
    "kind": "ConfigMap",
    "metadata": {
        "creationTimestamp": "2024-01-26T17:41:41Z",
        "name": "foo",
        "namespace": "default",
        "resourceVersion": "67610592",
        "uid": "3bd4347b-cdfd-4cf4-af18-99a4eb1f83ae"
    }
}

and from a pod which has it mounted:

$ k exec -ti sleep-55bff88496-m5rkn -- cat /etc/dne/hindi; echo
प्रिय ग्राहक, आपने अपने

Am I missing something?

image

This is the file after opening it in the container using Vim. Kindly try opening it inside the container.I’m also not sure what is causing this error.

My ‘cat’ is from inside the container. I will try it with and editor when I get back tot desk, but I think this shows that there is not a systemic issue in how we handle the contents?

If I load this in the busybox vi, I get garbage (as exepected, since I don’t have the localization support in there).

If you hexdump or xxd the mounted file, you should be able to prove that it is byte-accurate, and that the error is in the rendering

But in other ubuntu machines there is no such extra characters showing in the language, why is that inside the container it is giving those random chars.

It could be the configuration inside - does it have all the right settings and locale support? As an English speaker and ASCII typer, I have never really had to configure a machine for non-ASCII, so I don’t know.

Again, if you hexdump the file, that will tell you for certain if the error is in the data or in the rendering.

Yes it has the locale support, i’m doing the same thing as i was doing in other machines but this time it is just inside the container,Using hexdump or xxd it is showing dots since these are non-ascii(devanagari) characters, is there something else i can share with you which can help in solving this issue?

The hexdump should show you the hex values for the bytes, which one could look up to prove that they were the correct encoding of the data.

Or you could hexdump it on 2 different systems, one which works and one which does not, and compare the hexdumps.

Or you could compare it to my results:

$ k exec -ti sleep-55bff88496-m5rkn -- xxd /etc/dne/hindi; echo
00000000: e0a4 aae0 a58d e0a4 b0e0 a4bf e0a4 af20  ............... 
00000010: e0a4 97e0 a58d e0a4 b0e0 a4be e0a4 b9e0  ................
00000020: a495 2c20 e0a4 86e0 a4aa e0a4 a8e0 a587  .., ............
00000030: 20e0 a485 e0a4 aae0 a4a8 e0a5 87          ............