I don’t necessarily agree that clusterctl should be completely out of scope. Not all providers will want to write their own bootstrapping logic. For the case of Machine-based implementations, there is a general common pattern that will apply, just as it does today.
That said, I do not think clusterctl should be a requirement for using cluster-api.
That’s what I’ve been thinking, but wanted someone else to say it first. Want to see some of the work clusterctl does right now move into the controllers, if at all possible, and perhaps see clusterctl itself turn into a kubectl plugin.
If that’s a way forward, which behaviours of the current clusterctl could potentially be moved?
I am not 100% up to date on everything clusterctl currently has to do. All I was trying to say is that most interaction, long terms, look like API interactions to me (i.e. kubectl or other general client would be the interface). As far as initial management cluster creation is concerned, I am suggesting that provider-specific code is necessary at some level, and I do not see a good reason for pushing that too many levels behind something that tries to appear generalised.
If management cluster creation is all that clusterctl currently does, then I’d suggest we could say that each provide has clusterctl-<providers> and it is meant to implements a shared set of common subcommands and flags.
I just reread the cooperating controllers mechanism proposal. Thank you to everyone who has contributed to it so far. I found one thing in it that immediately made me think “aha! this is it!” Let me paste it here:
It’s not clear to me how we could handle inter-resource coordination not related to the general state machine, but more for requesting data from an “authoritative” plugin. One of the benefits I see with the webhook approach is that we could define a set of calls that could be used for getting required information from other parts of the system without knowledge of the implementation details. To give a bit more detail, assuming a “provider” comprised of: 1) an AWS infrastructure provider (both for common cluster infrastructure and machine instance provisioning) 2) a Kubeadm bootstrapping provider (that generates a cloud-init config) 3) a machine-based control plane provider 4) some type of addon provider (mainly to handle CNI). I would like to provide configuration to the kubeadm bootstrapper saying that my control plane should be initialized with the ELB DNS name created by the infrastructure provider, with the correct CNI-related settings from the addon provider. I would also like to provide configuration to the Security Groups configuration based on the addon provider. In the webhook model, we could define endpoints that could provide this type of inter-extension/plugin communication. It is not clear to me how we would do the same in this proposed model.
This paragraph points out a key problem I think we need to think about: how do multiple components in the Cluster API ecosystem interact with each other, especially if we break up the current all-encompassing singular provider into multiple distinct providers, each with its own unique purpose?
Let’s look at a part of the example quoted above: how does my ELB info make its way into the kubeadm config?
Before we can even start to answer that, what am I, as a user, providing as input to the system to set up a control plane? It would probably be nice to have some sort of ControlPlane CRD. I’ve heard people talk about a couple of different models for control planes: 1) machine based, 2) pod based, 3) externally managed (think GKE, etc.). Can we use a single ControlPlane CRD for all 3 models? I’m not sure… so we’ll leave that TBD.
Let’s pretend, for the moment, we have some ControlPlane CRD where I specify that I want 3 members and maybe there’s some way for me to point it at a template for a Machine (so it knows I want an AWS-specific m5.large machine in us-east-1, etc.). It probably also should have fields for the kubeadm ClusterConfiguration and InitConfiguration data, which is where we’d need to put the ELB’s DNS name (in ClusterConfiguration.ControlPlaneEndpoint).
How do we design the system to allow these different extension providers to work together? How does the ControlPlane code know it should wait for the ControlPlaneEndpoint to be filled in? Or maybe in some situations it’s required, but in others it’s not? What coordinates telling AWS to create an ELB? What gets the ELB’s name into the ControlPlane object?
I realize this particular example is very specific to AWS, and it requires thinking about the control plane lifecycle, the extension mechanism, and the data model, but I feel strongly that this example is the sort of thing we need to solve.
Going to be a fair amount of work, but I think we should take a few different “clouds” , and also the bare metal case, and draw/write out the state transition diagrams for the lifecycle of the clusters & machines in each, trying to capture as many of the different levels as possible - etcd, CNI, PVCs, load balancers etc…Can probably compose these from smaller components. Does for instance, an OpenStack with Juniper network need any different data and ordering to one with a Cisco or VMware SDN? If we can find volunteers with knowledge of one or more of those, that’d be a great help. The data model will flow out of the similarities and differences that emerge. I think that will necessarily flow into thinking about implementation, but don’t think that’s necessarily a problem.
I like the idea from the data-model proposal that, similar to the way CSI works, has each provider register itself with an ExtensionRegistration. Part of that registration could include some representation of the Cluster API objects that the extension cares about (Machine, MachineSet, Cluster, etc.). For any type of object, the Cluster API code could then build a list of the extensions that might want to be involved in the lifecycle of the object in some way. If each Cluster API object has an “initialization” state during which all of the extensions are expected to “check in” for that instance, then we can also support the case where we have multiple providers that do the same thing in different ways (and are mutually exclusive), since settings on the Clusters API object would specify which implementation should be used for that instance.
So, when a ControlPlane is created, all of the things that have ExtensionRegistration objects that say “I care about ControlPlanes” would have to leave an annotation saying either “I care about this ControlPlane” or “I do not care about this ControlPlane” on the new object. Until that is done, the Cluster API controller would not continue the workflow for the ControlPlane. For the AWS case, the ControlPlane would need somehow to indicate a type so that the AWS provider could know to say “I care” and the Azure provider could know to say “I do not care”. The same principle applies for all of the objects, including the InitConfiguration, Machines, whatever is going to control kubeadm, etc.
During the rest of the workflow, each phase of the lifecycle is triggered by the Cluster API leaving an annotation on the provider objects attached to the Cluster API object. We could either do that by having separate lists of objects for each phase (where the same object might appear in multiple phases), or by having one list and notifying every object of every phase. Either way, the Cluster API code knows that before it continues from one phase to the next, it must wait to hear back from the controllers for all of the objects it notified at the start of the current phase.
For the ControlPlane example, if we want the ELB to be optional, then that could be managed via a flag on the AWS provider object that is attached to the ControlPlane. The AWS provider would then know to create an ELB during the appropriate phase of building the ControlPlane. Maybe that happens during an early phase for making infrastructure resources before machines - I’m not sure if the ELB must have EC2 instances attached when it is created or if that comes later. The controller would wait to report that it had completed that phase until the ELB was set up.
I would expect the endpoint value to always be required, but maybe there’s a use case where it isn’t that I’m not familiar with. If it’s optional, then a controllers must somehow decide whether there is (or should be) a value. If there is (or will be) a value, then that controller should not report that it has completed its work until it has filled in the value.
So, the Cluster API code waits for values that are absolutely always required (like which provider to use to build a ControlPlane), and waits for “phase complete” notifications from the providers signed up to help with each instance. The trick is to describe enough phases sufficiently generically that they don’t end up as single-purpose but also have more descriptive names than “phase1” and “phase2”.
I agree the type is needed. Not sure what to call it, though. The Provider type? Environment type?
And if the type is there, then I think it can also be part of the ExtensionRegistration. That way, all AWS components can register themselves as caring about “AWS.” Then there is no need for all components to report whether they care about the object or not.
Also, some components (e.g. for Kubernetes provisioning using kubeadm) might reasonably be shared across providers, but perhaps not all. That should be up to the CAPI admin. The admin can register some extension for whatever is needed, e.g., “AWS,” “Azure,” etc.
We might have different types, for different purposes (provider, boot strap, etc.). If we need a type, we’ll know what the type is for, and can give the field a good name.
That might be a useful optimization. Although part of the work of checking in for each object will be to ensure that there is a provider-specific object attached to the core object. So the ssh bootstrap provider might register an error if a Machine doesn’t have an SSHBootStrapSettings object attached with the ssh key information needed to log in to the host backing the Machine. That would prevent the Machine from being created until the missing values are provided.
I definitely do see controllers being shared in a mix-and-match sort of way. I’m not sure all of these things are going to be global settings, though, especially in a configuration where we have 1 cluster managing several others using different providers. I see most of them as inputs to creating the Cluster or ControlPlane or Machine in the first place. The user gives the system information about where to create the thing and how to do so, and then the Cluster API drives the workflow while the various providers and extensions do the work.
I am missing a bit of background, and I think it would provide helpful context for current planning to briefly re-establish it.
At a high level I know we want standard APIs with an opportunity for providers to implement their own behaviors behind those APIs. I’m sure that earlier in the cluster-api project, it must have been considered that there could be a set of standard CRDs (which are just API definitions), with an expectation that providers implement their own controllers (sort of like the actuator pattern, but without boilerplate; just let people write controllers). CRDs and controllers are the natural separation of API definition and behavior that is central to k8s. Ingress is an example of a resource that uses that sort of approach.
What was the reasoning for not following such an approach? While we evaluate proposals that are more complex, it would be helpful to folks like me who missed that discussion, to just re-establish why that pattern was ruled out.
At the end of the ad hoc meeting on Fri May 17, we talked about what data cluster provisioning needs, and when the data is needed. The data, and when it is needed, dictates the data model (i.e. the schema we use to organize the data), the lifecycle hooks, and it should give us an idea of what data will go to extensions, and that should help us see how cooperating controllers or webhooks will work.
Instead of creating sequence or state diagrams, I’m going write literate shell scripts.
Taking inspiration from Daniel I wanted to get a sense of what using a web server might look like. I made a very rough draft and you can see the actually differences between cluster-api and a web server based cluster-api and cluster-api-provider-aws and a web server cluster-api-provider-aws.
The main differences here are that I removed the context object (for simplicity) and added a server that exactly matches today’s cluster and machine interfaces. These web servers call into an instantiated actuator.
Instead of registering the actuator and starting a machine the manager now starts two web servers. One for machine requests on port 8000 and one for cluster requests on port 8001. This removes the need to import cluster-api controllers.
There is a lot of repetition because this is a rough draft. I’d fully expect the code to shrink significantly with a small amount of work. There may also be some hard coded values that can be extracted
The main differences: Cluster and Machine objects have been added to the Cluster API controller instead of just the machineset/deployment. I’ve added a few web clients to wrap the http calls to the web servers. One for machine and one for cluster. You can see there is a ton of repetition here and some clean up will be excellent. You can almost see helper libraries dropping out of this code.
All calls to the actuator are replaced with a web client call.
Delete on clusters is not implemented yet.
I really hope this helps someone else start to think about what a webserver implementation might look like and how to think about all the different pieces involved. I also hope this provides instructions on how a v1alpha1 provider could be minimally ported to a web server model with no changes to the actuator.
Feedback very welcome!
How to see it working
Create a kind cluster and the provider-components yaml using the images provided above.
Apply the provider-components to the kind cluster and watch the logs on the pods.
After that finishes apply the cluster object and watch the logs.
After that finishes apply just the control plane machine and watch the logs.
After that finishes (and kubeadm has finished running) apply the worker nodes you desire and watch the logs.
Wow, @chuckha that is really awesome! Do I understand it right that the cluster-api will be eventually responsible for sequencing & ordering (i.e. one-at-a-time control planes before possibly-parallel nodes)?
Nice work! This helps visualizing the flow even better.
@sethp-nr I think what you mentioning it’s correct but I’ll let @chuckha go into more details. The most powerful feature of webhook is to control the sequence of calls made in order to manage the lifecycle of a k8s cluster
This is the kind of thing that I can’t say with certainty. With what we know right now I can envision at least two futures:
The first being the situation you described. Cluster API manages the ordering and sequencing of nodes (control plane nodes or worker nodes) joining a cluster. Every provider must implement whatever Cluster API dictates be it parallel joins or sequential joins.
There is another future where we use a “features” endpoint on the provider to let Cluster API know if the provider supports parallel joins. If the provider supports parallel joins then Cluster API would use the parallel joins feature exposed on the provider (exposed as some endpoint) otherwise Cluster API would use the sequential join endpoint the provider exposes. It’s possible, like in EKS, we don’t care how nodes are joined so in that case neither feature needs to be implemented and Cluster API can just assume nodes are joining somehow without issuing any commands.