Hi Kubernetes community, I open-sourced a small project called `kubernetes-ontology` and would appreciate feedback from people who work on troubleshooting, platform engineering, SRE tooling, or agent-assisted operations.
Repo: GitHub - Colvin-Y/kubernetes-ontology: Read-only Kubernetes ontology database for diagnostics and AI-agent consumption. · GitHub
Landing page: https://kubernetes-ontology.vercel.app
The project builds a read-only in-memory graph from Kubernetes objects, then exposes typed entity/relation queries, diagnostic subgraphs, CLI/HTTP APIs, and a local topology viewer.
The current MVP focuses on:
- Pod and Workload diagnostic entrypoints
- ownerReference chains, including Pod → ReplicaSet → Deployment
- Service selector matches
- Pod links to Node, ConfigMap, Secret, ServiceAccount, Image, and PVC
- PVC → PV → StorageClass → CSIDriver storage paths
- ServiceAccount to RoleBinding / ClusterRoleBinding evidence
- Kubernetes Event and admission webhook evidence
- HelmRelease / HelmChart evidence from standard Helm labels and annotations
- stable JSON output for downstream AI agents
The safety model is intentionally conservative: it does not mutate observed workloads, does not install workload CRDs, does not require a graph database, and does not perform runtime writes to the Kubernetes resources being diagnosed.
I am trying to validate the graph-first shape more than promote a finished product, so concrete feedback would be very helpful:
- Which Kubernetes failure mode should the graph explain next? CrashLoopBackOff, Pending scheduling, DNS / Service routing, volume attach / mount, webhook admission failures, RBAC, Helm upgrade failures, or something else?
- Are the graph edges and provenance fields useful enough for real incident workflows?
- What would make the quickstart easier to try in private or air-gapped clusters?
If this overlaps with your troubleshooting workflow, I would love to hear what context you usually wish you had in one place.