It is possible to use big language models inside K8S to optimize cluster scheduling as well as intelligent management of clusters

Hello everyone,
I’m a sophomore with a budding interest in Kubernetes (K8S) and have some experience in artificial intelligence and distributed systems. As I’ve been exploring K8S clusters, I had an idea. Could we leverage large language models for monitoring and analyzing the state of K8S clusters to enhance the intelligence of cluster scheduling? This could potentially reduce the workload for operations and maintenance staff.
Could large models even interact with the API server via an agent to perform smarter container repairs, instead of just restarting containers? I envision a future where operations personnel could interact directly with large models to manage clusters more efficiently. However, I’m aware that this idea might have its limitations. For instance, the output speed of large models could be a bottleneck, leading to reduced scheduling efficiency and potential security and reliability issues with the cluster.
I’d like to start with a simpler example where the large model could detect the cluster status in real-time and provide operations and maintenance staff with a cluster analysis report. The model could gather and analyze data from etcd or from the controller’s operations on resources to generate corresponding reports.
As a newcomer to K8S, I’m not entirely sure about the feasibility and value of this idea. The experts around me aren’t familiar with cloud-native technologies, so I’m seeking advice from the community. I’d appreciate any feedback or suggestions on this concept.
Thank you!