SDD 0015 - Secrets Management

Author

Tobias Brunner, Simon Rüegg

Owner

SIG Security

Reviewers (SIG)

Syn Kickstart Group

Date

2019-10-28

Status

accepted

Summary

The secret management with Vault defines how secrets are stored and retrieved in a secure and auditable way.

Motivation

Securely storing and retrieving secrets is a key part of an application runtime platform. We must ensure that secrets can’t leak and that access to secrets is done in a controlled way. No plaintext secrets must be stored in Git.

Goals

Securely store and retrieve secrets
Controlled access to secrets
Auditing of secret access
Automated handling of secrets as much as possible
Helper to easily rotate secrets

Non-Goals

Automation for storing static secrets in secret store
Detailed setup of the Central Vault installation

Design Proposal

To accomplish the stated goals we will rely on Vault as a secret store. Vault is an extensive open source project to cover a lot aspects regarding secrets management. After an evaluation it was clear that it’s the only open source tool which provides such a comprehensive feature list.

Cluster Local Vault

Each Project Syn managed cluster will have it’s own Vault. By default this will be installed on the cluster and managed by the Platform Configuration Management. If a customer already has a Vault installation it could be reused, instead of installing it on the cluster.

The cluster local Vault can be used by application running on the cluster and will be set up to support the various features targeted to K8s.

Bootstrapping

The Steward Cluster Agent will bootstrap the cluster local Vault installation and in coordination with the Lieutenant - Management API set up the auto unseal, recovery keys and root tokens. An important aspect is to make sure Argo CD can only apply secrets once the cluster local Vault is ready.

Unsealing

The cluster local Vault will be set up to auto unseal using the Central Vault transit secret engine. This ensures restarting Vault pods won’t leave the cluster in an unusable state. The recovery keys generated during setup will be GPG encrypted and centrally stored. This ensures that even in the event of an outage of the Central Vault we’re still able to unseal cluster local Vaults.

The cluster local Vault will use the service account token from Steward to authenticate against the Central Vault. An initContainer running the Vault Agent will lease a token which can be used for auto unsealing.

Storage Backend

The Raft storage backend will be used by the cluster local Vaults. This simplifies the setup and maintenance of the Vault setups. Even though this feature is still labeled as Beta, we’ll make use of it and will switch to another backend if it proves to be unstable.

Backup

There are two parts of Vault which need to be backed up: The storage backend and the unseal keys. Without both of them, either is unusable. The unseal keys will already be stored centrally as discussed. This leaves the storage backend to be backed up. The Raft storage backend provides a snapshot functionality for this very purpose.

Central Vault

A Vault instance will be operated globally. This Central Vault instance might also be used for other internal use cases, like functioning as a password manager.

Unsealing

The unseal keys will be GPG encrypted and distributed. The exact setup and requirements is to be defined.

Transit Secret

The transit secret engine is used to auto unseal cluster local Vault instances. A separate key per cluster is used and the Kubernetes authentication method to authenticate the clusters. Since the steward cluster agent will have a service account token to authenticate to the Lieutenant API it can use the same token to authenticate to the central Vault. The setup of these parts will be done in coordination by Lieutenant and Steward.

The Central Vault will be configured with the Kubernetes auth method, against the central infrastructure cluster. Since the steward cluster agents will have a service account token from the infra cluster, they can use it to authenticate against this vault instance as well.

User Stories

Story 1: Secrets management for configuration stored in Git

As a user of Syn I can store secrets in the secret management system and the application development description in Git, pointing to the necessary secrets available in the secrets management system. The system takes care of revealing the secret only when applying configuration to the platform. No secrets are stored in Git, especially not unencrypted.

An example: Storing a Kubernetes object of the kind "Secret" in Git only contains a pointer to the secret and not the secret itself. The configuration management system then takes care of retrieving the requested secret during the apply phase on the cluster itself, so the secret never leaves the cluster.

Story 2: Securely store and access configuration data

As a user of Syn I want to securely access secrets from my application running on the platform. Objects of the kind "Secret" aren’t automatically stored encrypted and are available in quasi-plaintext to the platform. The platform should provide means to access secrets in a more secure way.

Story 3: Automated secret management

As a user of Syn I don’t want to care about managing secrets and leave this work up to the platform. I want to have access to automatically generated secrets and use them for the purpose they’re meant for.

Examples:

Connection credentials for a Syn managed DB
Automatically generated encryption secret to be used by an application
TLS certificate & key to be used by an application

Story 4: Rotate secrets

As a user of Syn I want the possibility to rotate secrets. If credentials get leaked or otherwise compromised it should be easy and fast to rotate the affected secrets.

Story 5: Provide an audit trail for secrets

As a user of Syn I want the ability to see an audit trail for secrets. At the minimum I want to see when and by whom a certain secret was accessed.

Story 6: Provide secrets for encrypted volumes

As a user of Syn I’m using a CSI provider that supports encryption, I have to provide secretes (encryption key passphrase) the CSI provider can use for the LUKS encryption. The cloud provider must not be able to decrypt volumes (no access to the encryption keys, except VM in-memory).

Storing these secrets on the cluster without encryption is therefore a no-go as it would negate mostly all benefits from disk / volume encryption. Also using the the cloud provider’s KMS isn’t an option in the case where the cloud provider must not have access to the encryption keys.

Implementation Details

Cluster Local Vault Recovery Keys

TBD how and where exactly the recovery keys of the cluster local Vaults will be stored.

Risks and Mitigations

Kubernetes Secrets

The current design proposal introduces a large risk which invalidates some of Vault’s assumptions and therefore invalidates some of the provided benefits: Our GitOps tooling reveals the secrets at "runtime" and stores them in a native Kubernetes secret. As of now there are different approaches to work around this issue.

Audit Trail

From Vault’s view, all secrets are accessed by the GitOps tool. Since they’re stored in the Kubernetes secrets, Vault has no control over them anymore. This invalidates the whole audit and access control features of Vault.

Plaintext etcd

This means that these secrets will be stored in plain text in the cluster’s etcd cluster, except if encryption at rest is configured. As this currently is only really supported on the major cloud providers, most clusters don’t set it up. An attacker with access to the data in etcd, be that the live data or from a backup, can access all the secrets.

Reasoning

Even though we currently don’t profit from most of the features Vault brings to the table, this approach enables us to move away from native Kubernetes secrets in the feature and make real use of the Vault features. Additionally it enables other applications on the cluster to use the Vault instance.

Auto Unseal

A clear risk of the auto unseal setup of cluster local Vaults is, that the token used to authenticate to the VSHN Vault can be used to get access to the Vault’s master key. Therefore special care needs to be taken for handling these tokens. For now the token will be stored in a Kubernetes secret of the respective cluster. If a cluster doesn’t configure encryption at rest properly, this token will end up in clear text in the etcd database.

A possible mitigation would be to not enable auto unseal for a cluster local Vault and only have the unseal key distributed in GPG secrets. This would require a manual unseal process for each restart of a Vault pod.

Central Vault Availability

The Central Vault has a high requirement in regards to availability. Since all cluster local Vaults depend on it for the auto unseal, in case of an outage of the Central Vault, cluster local Vaults can not auto unseal anymore. This is only a problem if Vault pods are restarted or to bootstrap new clusters. As a disaster recovery measure, the recovery keys for each cluster local Vault are stored GPG encrypted in a central location. This ensures that even in the case of a enduring disaster, cluster local Vaults can be unsealed with these recovery keys.

Alternatives

Central Vault

A possible alternative would be using only a single central Vault installation, instead of cluster local Vaults. The main drawback of this approach is the central point of risk it would create. Also, as stated by the docs, Vault shouldn’t be used in a multi tenant environment.

References

Kapitan secret reveal:
- kapitan.dev/secrets/
- github.com/deepmind/kapitan/blob/master/docs/kap_proposals/kap_5_ref_types_redesign.md
Vault auto unseal using transit encryption (another Vault)
- www.vaultproject.io/docs/configuration/seal/transit.html
- learn.hashicorp.com/vault/operations/autounseal-transit
Vault K8s auth - www.vaultproject.io/docs/auth/kubernetes.html
KubeVault - github.com/kubevault/docs