SDD 0007 - Lieutenant - Management API

Author

Marco Fretz, Tobias Brunner

Owner

SIG Syn

Reviewers (SIG)

Date

2019-10-18

Status

implemented

Summary

This SDD documents the design of the inventory and management API service which holds all information about Syn managed Kubernetes clusters.

Motivation

A central inventory helps to keep track of all Syn managed Kubernetes clusters, their content and it’s the central hub for all operational needs. It provides inventory data to query for all aspects about these clusters. All clusters must be registered in Lieutenant so that they can get recognized by the various components of Syn. It presents itself as a REST API, providing all the needed functionality. This API gets used by frontends like the VSHN Portal, by CLI clients or by other means of automation.

Goals

The API provides several key features for the Syn platform. It’s architected around many helper services and orchestrates them to do their job well.

  • Be the central hub of Syn managed services and clusters including tenant information, fully ready for multitenancy operations

  • Provide insights to all running services and clusters

  • Be the key part of config management by providing management functionality for GitOps Git repositories

Non-Goals

  • Kubernetes cluster provisioning - That’s up to external tools like Rancher or Crossplane

API Personas

Consumers of the API

Consumer Use Cases

Commodore

  • Retrieve facts about a cluster

  • Read-only access to the API

Steward

  • Initial installation in cluster with generated Kubernetes manifest including:

    • Git repository which contains cluster configuration catalog

    • Bootstrap token

  • SSH deploy key configuration in API

Web Admin

Full administration possibilities:

  • CRUD Tenants

  • CRUD Clusters

  • Query Facts

CLI

Full administration possibilities

Design Proposal

lieutenant

Lieutenant is an OpenAPI 3.0 REST API written in Go. It makes use of specialized Kubernetes Operators running on Kubernetes, implementing the core functionality and reconciliation processes. The Lieutenant API application runs in Kubernetes on works directly with the Kubernetes API, managing the objects of the Operators.

REST API

Tenant Management
  • Management of tenants for assigning clusters and managing access rights. CRUD functionality.

  • For VSHN: The idea is to sync this data from the portal. A customer can be Syn enabled which then automatically syncs the information to Lieutenant and makes it available in the API.

  • Data storage location: Tenant CRD

Cluster Registry
  • Registry of all Syn managed Kubernetes clusters with the following features:

    • Create (register) cluster

    • Read (list) all clusters with filtering capabilities

    • Update (edit) cluster

    • Delete cluster

  • The following information is stored:

    • Automatically generated cluster ID (string with 6 lowercase chars and numbers) - not editable

    • Cluster display name

    • Initial registration key for Steward

      • Lifetime

      • Validity

    • Kubernetes API endpoint

    • Reference to Kubernetes credentials in Vault

  • Data storage location: Cluster CRD

Facts Inventory
  • Inventory of facts about the clusters and services with a query interface.

  • The key of the facts is the cluster ID.

  • There are two type of facts:

    • Static facts: Manually configured meta information about a cluster, stored in cluster Object: Tenant, SLA, Contact information, Cloud information, Kubernetes distribution

    • Dynamic facts: Facts collected by an in-cluster agent and regularly sent to Lieutenant, stored in Timeseries DB: Kubernetes version, Deployed managed services and their version, Node details (OS, CPU, RAM, labels) and number of nodes, important objects, Syn predefined annotations on objects, per object contact information and responsibilities via annotations, Ingress objects spec.rules[].host (helps to find which cluster hosts which URL).

  • Facts aren’t actively collected by polling, they’re pushed to the API by Steward.

  • All dynamic facts are versioned (and update generates a new version) which allows historical information when has what changed. Facts have timestamps, so that a freshness indicator is possible.

  • Data storage location: InfluxDB

Component state
  • Live state of the managed services

  • Querying Argo CD, running on the cluster

  • Data storage location: Argo CD

Steward Configuration
  • Deployment and configuration delivery for Steward bootstrapping

  • The API provides the Steward deployment JSON for the initial cluster synfection.

  • The following data is delivered to Steward:

    • URL to cluster catalog git repository

    • SSH host key of git repository

Vault Integration
  • Vault bootstrapping for cluster

  • The API configures Vault for the cluster:

    • Create access credentials

    • Configure authorization

    • Steward authentication token handling

      • Bootstrap token can only be used once and for a limited time

Maintenance Utilities
  • Helper functions to facilitate proper maintenance

  • Integration with Renovate, f.e. list all open Renovate Merge Request on different Git repository instances, including the possibility to merge them.

  • Integration with to-be-invented host maintenance agent?

  • Data storage location: Git Merge/Pull requests

REST OpenAPI Spec

All objects are stored in the same Kubernetes namespace as the API and the Operator is running.

The OpenAPI specification is stored in Git: openapi.yaml.

API Authentication

Bearer Token

Authentication to the API is handled via Kubernetes service account tokens. Except for the /docs, /healthz and /install/steward.json endpoints, every request must contain a bearer token. The HTTP header Authorization must be set to Bearer <token> with <token> beign a valid JWT token. This JWT token will then be used by the API to authenticate against the Kubernetes cluster.

Bootstrap Token

The /install/steward.json endpoint must provide a query parameter token which contains the bootstrap token of a cluster. Such a token can only be used once and has a short (for example ~30 minutes) expiry time. The API uses it’s own service account to authenticate to Kubernetes and search the clusters for the provided bootstrap token. Once a cluster is found and the bootstrap token is still valid, the installation manifests will be returned and the token marked invalid.

API Token

The API needs a service account to authenticate to Kubernetes. This service account should have the minimum required rights to search for clusters, mark bootstrap tokens as invalid and read a cluster’s service account token.

API Authorization

With the exception of the /install/steward.json endpoint, authorization of all API requests is fully delegated to the Kubernetes cluster.

Kubernetes RBAC

Kubernets RBAC rules will be used to restrict the various service accounts to their required scope.

Open Policy Agent

The Open Policy Agent (OPA) will be used to further restrict access of the service accounts to the cluster. More advanced features like a tenant accessing all his clusters can be implemented with the OPA which can not be represented by RBAC alone.

Cluster Bootstrap

The /install/steward.json endpoint serves as an entrypoint to bootstrap (adopt) a new cluster. It delivers the manifests required to bootstrap the SYN system on a cluster. Authorization is done via a Bootstrap Token which identifies the cluster to bootstrap. The API searches the cluster and checks the validity of the token: if the token has expired or was already used (attribute valid=false).

Different Entities
Cluster

A service account per cluster is created with an RBAC role restricting it’s access to the cluster it belongs to (using resourceNames).

Tenant

TBD

Commodore

TBD

Operator and Custom Resource Definitions (CRDs)

CRD Description

Tenant

When a tenant is created, a GitRepo object is created to create the tenant configuration repository.

GitRepo

Git repository management (CRUD repos on GitLab, GitHub and Gitea). Lieutenant manages the CR objects and queries the status fields to get the status.

The Operator manages the following objects:

GitRepo

  • Create Git repository

    • By default on git.vshn.net GitLab

    • Supported are GitLab, GitHub and Gitea APIs

    • SSH key delivered by Steward is configured as deploy key

  • Delete Git repository

  • Update Git repository when configuration changes

    • Only SSH deploy key change supported

Cluster

When a Cluster object is created:

  • a GitRepo object is created to create the cluster catalog configuration repository.

  • a Proxy object is created to provision an Inlets endpoint for the cluster

When a Cluster object is deleted:

  • All created objects are deleted by ownerReference mechanisms

Proxy

Manages the deployment and configuration of an Inlets server per Syn Kubernetes cluster.

Details tbd

Operator Common

The first iteration is a single Operator consisting of several controllers, sharing CR Go structs as the objects depend on each other. A later iteration could split these controllers into their own Operator if it makes sense then. The Operator will be implemented using the operator-sdk in Go.

Property Value

API group

syn.tools

API version

v1alpha1

Controller: Tenant

  • Responsible for the Tenant  kind

  • Manages the GitRepo  object for the tenant configuration

  • Admission webhook denies deletion of Tenant object when there are still Cluster objects available referencing the Tenant object to be deleted

Object Example
apiVersion: syn.tools/v1alpha1
kind: Tenant
metadata:
  name: aezoo6
  namespace: syn-lieutenant
spec:
  displayName: Big Corp.
  gitRepoTemplate:
    spec:
      <GitRepoSpec>
---
apiVersion: syn.tools/v1alpha1
kind: Tenant
metadata:
  name: os3ce3
  namespace: syn-lieutenant
spec:
  displayName: Acme Corp. (Subtenant of Big Corp)
  tenantRef:
    name: aezoo6
  gitRepoURL: https://github.com/acmecorp/commodore-config.git
Field Description

spec.displayName

Display name of the tenant.

spec.gitRepoURL

Git repository storing the tenant configuration. If this is set, no gitRepoTemplate is needed.

spec.gitRepoTemplate.spec

Template for managing the GitRepo object. If not set, no  GitRepo object will be created.

Controller: Cluster Registry

Functionality:

  • Responsible for the Cluster  kind

  • Manages the GitRepo object for cluster configuration catalog

  • (Manages Proxy  object for providing an Inlets server endpoint) later

  • Stores the registration key and it’s validity for initial Steward bootstrapping

  • Admission webhook denies creation of Cluster object when the referenced Tenant object isn’t available

Object Example
apiVersion: syn.tools/v1alpha1
kind: Cluster
metadata:
  name: ae3oso
  namespace: syn-lieutenant
  labels:
    syn.tools/tenant: aezoo6
spec:
  displayName: Big Corp. Production Cluster
  gitRepoTemplate:
    spec:
      <GitRepoSpec>
  tenantRef:
    name: aezoo6
  tokenLifetime: 4h
  facts:
    distribution: openshift3
    cloud: cloudscale
    region: rma1
status:
  bootstrapToken:
    token: sdaf445ftzzt
    valid: true
    validUntil: 2020-01-27 12:00
---
apiVersion: syn.tools/v1alpha1
kind: Cluster
metadata:
  name: etahv3
  namespace: syn-lieutenant
  labels:
    syn.tools/tenant: os3ce3
spec:
  displayName: Acme Corp. Dev Cluster
  gitRepoURL: https://github.com/acmecorp/gitops-mycluster.git
  gitHostKeys: |
    gitlab.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAfuCHKVTjquxvt6CM6tdG4SLp1Btn/nOeHHE5UOzRdf
    gitlab.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFSMqzJeV9rUzU4kWitGjeR4PWSa29SPqJ1fVkhtj3Hw9xjLVXVYrU9QlYWrOLXBpQ6KWjbjTDTdDkoohFzgbEY=
  tenantRef:
    name: os3ce3
  tokenLifetime: 4h
  facts:
    distribution: openshift4
    cloud: aws
    region: eu-west-1
status:
  bootstrapToken:
    token: sdaf445ftzzt
    valid: true
    validUntil: 2020-01-27 12:00
Field Description

metadata.labels.syn.tools/tenant

Autosynced from spec.tenantRef.name. Helpful for label filtering.

spec.displayName

Display name of cluster which could be different from metadata.name. Allows cluster renaming should it be needed.

spec.gitRepoURL

Git repository storing the cluster configuration catalog. If this is set, no gitRepoTemplate is needed.

spec.gitHostKeys

SSH host key of the git server. Only needed if not standard (GitHub, GitLab.com). Will be set by the git operator.

spec.gitRepoTemplate.spec

Template for managing the GitRepo object.

spec.tenantRef

Reference to Tenant object the cluster belongs to. It needs to be in the same namespace as the cluster object.

spec.facts

Key/value pairs of statically configured cluster facts which don’t change on a cluster.

spec.status.bootstrapToken

This key is used only once for Steward to register.

  • token: Token string

  • validUntil: Validity

  • valid: True or false.

When the initial registration is done, spec.bootstrapToken.valid will be changed to false.

Controller: Git Repository

  • Responsible for the GitRepo  kind

  • Manages Git repository on the configured remote (CRUD)

  • Configures deploy keys on the remote Git repository

  • It’s always referenced to the Tenant as owner

  • Admission webhook denies creation of GitRepo object when the referenced Tenant object isn’t available

apiVersion: syn.tools/v1alpha1
kind: GitRepo
metadata:
  name: ohj4iu
  namespace: syn-lieutenant
  labels:
    syn.tools/tenant: aezoo6
spec:
  apiSecretRef:
    name: my-api-secret
  deployKeys:
    steward:
      type: ssh-ed25519
      key: AAAA...
      writeAccess: true
  path: bigcorp
  repoName: commodore-config
  repoType: auto
  tenantRef:
    name: aezoo6
status:
  phase: created
  type: GitHub
  url: https://github.com/bigcorp/commodore-config.git
  hostKeys: |
    gitlab.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAfuCHKVTjquxvt6CM6tdG4SLp1Btn/nOeHHE5UOzRdf
    gitlab.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFSMqzJeV9rUzU4kWitGjeR4PWSa29SPqJ1fVkhtj3Hw9xjLVXVYrU9QlYWrOLXBpQ6KWjbjTDTdDkoohFzgbEY=
Field Description

metadata.labels.syn.tools/tenant

Autosynced from spec.tenantRef.name. Helpful for label filtering.

spec.apiSecretRef

Reference to secret containing connection information:

  • endpoint: URL to API endpoint

  • token: API token

  • hostKeys: SSH host keys of this git server (optional)

spec.deployKeys

Optional dict of SSH deploy keys. If not set, no deploy keys will be configured.

  • <key> Name for this key. "steward" is reserved to be used by Steward.

  • <key>.type: SSH key type

  • <key>.key: SSH key

  • <key>.writeAccess: Optional. true or false. Default false

spec.path

Path to Git repository

spec.repoName

Name of Git repository

spec.repoType

Type of the repo. Can be one of auto or unmanaged, defaults to auto. With a value of unmanaged the git operator won’t manage this object and it’s only used as data storage (for example for SSH keys).

spec.tenantRef

Reference to Tenant object the Git repository belongs to. It needs to be in the same namespace as the cluster object.

status.phase

Updated by Operator with current phase. Can be:

  • created

  • failed

  • unknown

There’s no deleted phase as a deleted repo will be re-created on the next reconcile run.

status.type

Autodiscovered Git repo type. Can be:

  • GitLab

  • GitHub

  • Gitea

status.url

Computed Git repository URL

status.hostKeys

SSH host keys of the git server. Will be synced from the spec.apiSecretRef.data["hostKeys"] secret.

Controller: Proxy

tbd