Cluster API: Extension mechanism workstream


Extension Mechanism Workstream

Welcome to the extension mechanism workstream! This workstream is related to how cluster-api-providers interact with cluster-api. Longer form async discussions will be held in this thread

Some useful links:

Always feel free to reach out on the k8s slack (@cha).


Cluster API: data-model workstream

Vote for possible meeting times here:

1 Like

Cluster API Weekly Meeting

Looks like the highest voted day is April 16th at 1:00 PM New York (10:00 AM San Francisco) (6:00 PM London).

I’ll be sending out a meeting invite later today



Thanks for coming to the meeting today everyone!

Here is a summary of the meeting (copy and pasted notes)



  • [cha] Introduction/Kickoff
    • What is this workstream?
      • How does the Cluster API interact with Cluster API providers?
    • What will this workstream produce?
      • Proposal for how cluster-api and cluster-api-providers should interact
    • What do we have today?
    • Problems
      • Can’t ask the system to tell us what providers are running/installed, have to manually detect it
      • No type safety upon Cluster/Machine create
      • Fairly annoying to program with, lots of deserialize/reserialize. Can be programmed around though so it’s not that bad.
  • Existing discussions around extension mechanisms:
    • Webhooks
    • gRPC
      • Proposal , example implementation
      • Each provider implements CreateMachine, DeleteMachine, and ListMachines
      • Pros: Entire Machine controller is shared; don’t need to vendor entire cluster-api project into providers, providers don’t have to create/maintain controllers.
    • Provider-specific CRDs with a common CRD to link them to the cluster-api core (dhellmann needs to write this up)
    • [pablochacin] Use an object reference to the provider specific CRD which has a reconcile loop in the provider (draft proposal in Issue #833)
    • Golang plugins
    • Code-generator for a “framework” that someone can use to plug in the relevant calls. This would allow each provider to easily choose the implementation details.
  • Use Cases for Cluster API -> Pros & Cons for extension mechanisms
  • [michaelgugino] Suggests reconciliation loop at provider level only
    • [michaelgugino] rpc/webhook model requires implementing control loops in all providers in addition to the control loop in the top-level. More duplication IMO.
    • [jasondetiberus] Keeping reconciliation at the top-level is better for bubbling up status
    • [dhellmann] creating a proxy grpc/rest layer feels like overkill for just translating an imperative request from one format to another. Most providers already have some sort of client that is going to use the network to communicate with the provider infrastructure.
  • [jasondetiberus] we may end up deciding that different extension mechanisms make sense for different use cases
    • [justinsb] also in favor of the possibility of many approaches if it makes things fun
  • [pablochacin] What problem are we trying to solve? We’re talking about opaque data and implementation details, but we should take a step back and talk about what we want the solutions to be able to do.
  • Goals:
    • Cluster API

    • Infrastructure Providers

    • Tooling / UX

    • Enable out of tree development; make it easy

      • [dwat] Add-ons as operator / webhooks / grpc -> all allow out of tree development, but maybe one is easier or more familiar than others. Emphasis on easier.
    • Decoupling providers from upstream

      • [ilya] Providers are already out-of-tree, but they have to vendor Cluster API and are closely coupled to changes.
        • [michaelgugino] We should assess what couplings are pain-points for this model and adapt for easier reusability.
      • [jasondetiberus] end up deploying vendor CRDs alongside upstream Cluster API CRDs and trying to deploy providers alongside each other makes it very easy for them to break each other (depending on which CRD version gets deployed last).
      • [ilya] upstream and providers need to be able to move at their own pace, allowing either one to move ahead of the other (in terms of feature support)
    • Reduce code duplication/increase code reuse across infrastructure providers

    • Make easier to discover which providers are been used without looking inside the provider specific data in Cluster API objects [pablochacin]

      • [michaelgugino] We could achieve this easily by adding a ‘ProviderName’ field to the existing model.
    • Allow multiple providers to run at the same time [loc]

    • [pablo] should also consider clusterctl don’t be forced to have a different binary build for each provider, as the Cluster API should support multiple providers.

    • [detiber]Users should be able to introspect high level information about their deployed clusters without having to traverse into provider-specific crds.

      • [michaelgugino] I think this goal need some specific examples and more enumeration.
    • Prevent version mismatch between Cluster API and providers [loc]

      • [michaelgugino] This is an artifact of using web/rpc extension mechanism, would not be a problem for provider-as-reconciler
    • [michaelgugino] We should treat the reusable bits of code as a library, and move them out of the repo with the bits that are meant to be the single implementation (machine set, etc.). [dhellmann +1]

    • Provide status feedback to users (e.g. invalid provider config) [loc]

  • Send out doc for proposals [cha]

Beyond that, I’ve created a document for folks to help flesh out proposals. I’ve added some relevant requirements and use-cases, but they are by no means authoritative.



Along with the recording