SDD 0020 - OpenShift Hive

Author

Simon Rüegg

Owner

Reviewers (SIG)

Date

2020-06-10

Status

accepted

Summary

This describes how we want to integrate OpenShift Hive into Syn to automatically provision OpenShift clusters on supported cloud providers.

Motivation

The OpenShift Hive operator provides fully automated provisioning of new OpenShift clusters using the OpenShift installer. We want to integrate this into the Syn project in order to provision OpenShift clusters. Currently the following cloud providers are supported by Hive: * AWS * Azure * Google Cloud Platform

Goals

Define how Hive can be integrated into Syn
Define the process of creating and provisioning a new OpenShift cluster

Non-Goals

Support anything else than what Hive provides
Scale a cluster via Hive

Design Proposal

Hive Overview

Hive is an operator and works on the following CRDs:

ClusterImageSet to define the OpenShift version
ClusterDeployment to define a cluster
MachinePool to define the sizing/scaling of a cluster

In addition to these CRDs the operator needs various secrets for the following information:

Image pull secret for the OpenShift images
Credentials for the cloud provider
SSH keypair to access machines of the provisioned cluster
An install-config.yaml to configure the OpenShift installer

Most of the heavy lifting Hive does is implemented by the OpenShift installer which is also why the install-config.yaml file is required. This file is the same format as described in the installer docs. It needs to be provided in a secret and is only changed by the operator to set the pullSecret property with the referenced image pull secret.

Implementation Details

A controller is implemented which creates the necessary objects to provide Hive with the necessary information to provision an OpenShift cluster. Based on certain conditions the controller creates a set of objects (secrets and Hive CRs). For example if a cluster object has the annotation syn.tools/cluster-provisioner=hive set, an OpenShift cluster should be provisioned via Hive.

Provisioning Information

In order for this controller to be able to create the necessary objects, it needs to receive certain information. All confidential information (like cloud provider credentials or image pull secrets) should be stored in Kubernetes secrets and referenced. In a first PoC phase, all information that’s not yet present in a cluster’s facts must be provided in annotations on the cluster object:

hive.syn.tools/gcp-project GCP project name
hive.syn.tools/base-domain Base domain
hive.syn.tools/credentials-secret Name of the secret containing the cloud credentials

The reason for using annotations being that we don’t have to change the cluster CRD for the PoC. In a second step, once this design is validated and accepted, the information can be added as a typed struct to the cluster CRD.

Cluster Scaling

Scaling of a cluster shouldn’t be done via Hive. For provisioning we use a default setup of three master and three worker nodes. Once a cluster is provisioned, scaling will be implemented via other means for example in a Commodore component.

Cluster Synfection

Once a cluster is provisioned via Hive the next step is to synfect it (install Steward). Hive provides the SyncSet CRD to create and patch arbitrary resources on a provisioned cluster. While this mechanism could be used to install the Steward agent on a new cluster it also poses some downsides:

We would need to duplicate the installation manifests (currently implemented in the Lieutenant API) to create a SyncSet out of them
Once Syn is fully bootstrapped on the cluster, Steward itself will be managed by GitOps. This would end up with two systems managing the same resources (Hive and GitOps)

Instead of using a SyncSet we use the credentials of a provisioned cluster and run a job which installs Syn. This can be implemented relatively easy since the credentials (Kubeconfig) for a cluster are stored in a secret. The controller can create the Kubernetes job which mounts this secret and runs kubectl apply -f against the install URL of the respective cluster.

Risks and Mitigations

With this design we’re relatively tightly coupled to the Hive operator as in the created CRs (API) are defined by it. If Hive changes it’s API we’ve to implement these changes as well. As long as Hive follows the concept of a Kubernetes operator (acting on CRDs), the basic idea of this design should always apply though.

Alternatives

An alternative approach would be to leave the creation of the Hive CRs out of Syn and implement it in another component. This could be an option if project Syn shouldn’t provide specific provisioning options for Kubernetes distributions. The basic idea of this design would still apply though and it could be implemented separate from the Syn project.

References

OpenShift Hive - github.com/openshift/hive
OpenShift installer - github.com/openshift/installer
Hive architecture - github.com/openshift/hive/blob/master/docs/architecture.md
Hive SyncSet - github.com/openshift/hive/blob/master/docs/syncset.md