ResourceExhausted "message larger than max" from API Priority and Fairness

We are seeing a response as follows:

HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b04e478f-3f97-4b53-8d11-9e45250c0056', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Strict-Transport-Security': 'max-age=31536000', 'X-Kubernetes-Pf-Flowschema-Uid': '8754702d-0218-46a9-9b31-d4341589b19b', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f8c09fc6-2714-48ed-9c9f-13a3512f1e0f', 'Date': 'Thu, 28 Nov 2024 23:30:07 GMT', 'Content-Length': '196'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"rpc error: code = ResourceExhausted desc = trying to send message larger than max (2946576 vs. 2097152)","code":500}

Those headers (X-Kubernetes-Pf-Flowschema-Uid and X-Kubernetes-Pf-Prioritylevel-Uid ) indicate it’s from API Priority and Fairness.

If I look at the referenced flowschema and PriorityLevelConfiguration:

$ kubectl get flowschema service-accounts -o yaml
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  annotations:
    apf.kubernetes.io/autoupdate-spec: "true"
  creationTimestamp: "2022-01-27T21:36:57Z"
  generation: 1
  name: service-accounts
  resourceVersion: "2317448958"
  uid: 8754702d-0218-46a9-9b31-d4341589b19b
spec:
  distinguisherMethod:
    type: ByUser
  matchingPrecedence: 9000
  priorityLevelConfiguration:
    name: workload-low
  rules:
  - nonResourceRules:
    - nonResourceURLs:
      - '*'
      verbs:
      - '*'
    resourceRules:
    - apiGroups:
      - '*'
      clusterScope: true
      namespaces:
      - '*'
      resources:
      - '*'
      verbs:
      - '*'
    subjects:
    - group:
        name: system:serviceaccounts
      kind: Group
status:
  conditions:
  - lastTransitionTime: "2022-01-27T21:36:57Z"
    message: This FlowSchema references the PriorityLevelConfiguration object named
      "workload-low" and it exists
    reason: Found
    status: "False"
    type: Dangling
$ kubectl get PriorityLevelConfiguration workload-low -o yaml
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
  annotations:
    apf.kubernetes.io/autoupdate-spec: "true"
  creationTimestamp: "2022-01-27T21:36:57Z"
  generation: 2
  name: workload-low
  resourceVersion: "3536437878"
  uid: f8c09fc6-2714-48ed-9c9f-13a3512f1e0f
spec:
  limited:
    lendablePercent: 90
    limitResponse:
      queuing:
        handSize: 6
        queueLengthLimit: 50
        queues: 128
      type: Queue
    nominalConcurrencyShares: 100
  type: Limited
status: {}

There is nothing here explicitly about a 2MB limit. I’ve read through the docs and searched, but so far I’m stumped on where that limit is coming from. Does anyone know:

  1. What defines that 2MB limit?
  2. How it can be adjusted (if it can)?
  3. If changing any of the values in the PriorityLevelConfiguration would help?

Cluster information:

Kubernetes version: 1.28.15
Cloud being used: IKS (IBM)
Installation method: Cloud
Host OS: Ubuntu 20.04.6 LTS
CNI and version: Calico v3.27.4
CRI and version: containerd://1.7.23

I’m coming around to the APF headers being a bit of a red herring since that message comes from somewhere within grpc steam, but I still don’t understand where that limit is set.

This is from a test file, but maybe 2MB is the hard limit of the master nodes?