Using Hindi Language in Configmap

random · January 23, 2024, 11:26am

I’m using hindi language in configmap, which is utf-8 supported and when i use it in configmap then after mounting in the container, i see some random characters between it, although after printing it works fine, but what can be the issue if it is utf-8 supported and the random characters in non-english language ?

Cluster information:

Kubernetes version: v1.22.2
Cloud being used: metlallb
Host OS: ubuntu20

thockin · January 23, 2024, 4:51pm

Can you clarify what “see some random characters” means (see how? What tool did you use to read the file) and what “after printing it” means?

We should have tests that cover this, but on the off chance that we don’t, we shoul dadd them. You may do better to open a github issue.

random · January 24, 2024, 6:00am

After mounting configmap - प�~Mरिय �~W�~Mराह�~U, �~Fपन�~G �~Eपन
In configmap - प्रिय ग्राहक, आपने अपने
and when i’m printing the mounted file characters i’m getting the proper format but some characters are missing from original like आपने as अपने and many more, and i’m using golang to open the mounted json file.

Apologies for not attaching the proper problem, kindly check and if this is a proper issue i will open it on github.

thockin · January 24, 2024, 5:50pm

Sorry, I still need info. The ConfigMap API is described as:

Data map[string]string
BinaryData map[string][]byte

A string in Go is a series of bytes. I think we assume that any string-encoded field holds valid unicode data, but we don’t enforce that, as far as I can tell. When we render such a thing to JSON, it should be encoded so as to produce valid JSON

A []byte is an explicit denotation that we will not try to interpret the values at all, and when we encode it as JSON, it is not assumed to be a valid string.

That’s what the object ACTUALLY HOLDS. How you “see” that data matters because different tools have different levels of support for unicode and different error handling.

Can you show me how you are mounting the configmap, just so I can make sure I don’t try something different from what you are doing?

When you say “In configmap”, that does not tell me how you SEE the data. Is it kubectl get -o json or -o yaml or kubectl edit ?

This is FAR from my area of expertise, but it seems like something that we SHOULD get right but also that could easily slip thru the cracks of a primarily-in-English project. Also, as a non-speaker, it’s hard for me to see when we get it wrong somehow

random · January 25, 2024, 5:54am

apiVersion: v1
data:
  Data.json: |
    {
      "English":" You have",
      "Hindi":"आपने अपने",
      "Marathi":"तुम्ही तुमच्या"
      }
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: temp
    meta.helm.sh/release-namespace: random
  creationTimestamp: "2024-01-23T06:40:21Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: temp-config
  namespace: random
  resourceVersion: "1121"
  uid: d123123

This is the configmap api which i’m using and i’m checking this configmap using kubectl edit, and with kubectl get -o json, the data is same
deployment.yaml

 volumeMounts:
        - mountPath: /tmp 
          name: temp-config
      volumes:
      - name: temp-config
        configMap:
          name: temp-config

Reading data.json in golang and then printing values of it.

thockin · January 27, 2024, 5:45am

Here’s what I see:

$ k get -o json cm foo
{
    "apiVersion": "v1",
    "data": {
        "hindi": "प्रिय ग्राहक, आपने अपने",
        "mykey": "myvalue"
    },
    "kind": "ConfigMap",
    "metadata": {
        "creationTimestamp": "2024-01-26T17:41:41Z",
        "name": "foo",
        "namespace": "default",
        "resourceVersion": "67610592",
        "uid": "3bd4347b-cdfd-4cf4-af18-99a4eb1f83ae"
    }
}

and from a pod which has it mounted:

$ k exec -ti sleep-55bff88496-m5rkn -- cat /etc/dne/hindi; echo
प्रिय ग्राहक, आपने अपने

Am I missing something?

random · January 29, 2024, 5:14am

This is the file after opening it in the container using Vim. Kindly try opening it inside the container.I’m also not sure what is causing this error.

thockin · January 29, 2024, 3:39pm

My ‘cat’ is from inside the container. I will try it with and editor when I get back tot desk, but I think this shows that there is not a systemic issue in how we handle the contents?

thockin · January 29, 2024, 5:33pm

If I load this in the busybox vi, I get garbage (as exepected, since I don’t have the localization support in there).

If you hexdump or xxd the mounted file, you should be able to prove that it is byte-accurate, and that the error is in the rendering

random · January 30, 2024, 3:57am

But in other ubuntu machines there is no such extra characters showing in the language, why is that inside the container it is giving those random chars.

thockin · January 30, 2024, 5:53am

It could be the configuration inside - does it have all the right settings and locale support? As an English speaker and ASCII typer, I have never really had to configure a machine for non-ASCII, so I don’t know.

Again, if you hexdump the file, that will tell you for certain if the error is in the data or in the rendering.

random · February 1, 2024, 5:44am

Yes it has the locale support, i’m doing the same thing as i was doing in other machines but this time it is just inside the container,Using hexdump or xxd it is showing dots since these are non-ascii(devanagari) characters, is there something else i can share with you which can help in solving this issue?

thockin · February 1, 2024, 7:05am

The hexdump should show you the hex values for the bytes, which one could look up to prove that they were the correct encoding of the data.

Or you could hexdump it on 2 different systems, one which works and one which does not, and compare the hexdumps.

Or you could compare it to my results:

$ k exec -ti sleep-55bff88496-m5rkn -- xxd /etc/dne/hindi; echo
00000000: e0a4 aae0 a58d e0a4 b0e0 a4bf e0a4 af20  ............... 
00000010: e0a4 97e0 a58d e0a4 b0e0 a4be e0a4 b9e0  ................
00000020: a495 2c20 e0a4 86e0 a4aa e0a4 a8e0 a587  .., ............
00000030: 20e0 a485 e0a4 aae0 a4a8 e0a5 87          ............

Topic		Replies	Views
ConfigMap always interprets file General Discussions	0	538	July 23, 2020
Unreadable configmaps with -oyaml General Discussions	0	747	September 23, 2021
How to create Kubernets object using arrays General Discussions development	6	12489	March 17, 2022
Mounting data General Discussions development	0	7	August 24, 2024
Error when adding ConfigMap data into deployment yaml Regional Discussions	8	282	December 19, 2024

Using Hindi Language in Configmap

Cluster information:

Related topics