Control Plane Testing
Deploying applications with Kubernetes is easier than ever, yet developers face increasing complexity.
Kubernetes simplifies deployment, but with it comes a labyrinth of potential issues. From resource conflicts to version incompatibilities, a failure in one component can cascade. Understanding application health through metric models like RED (Requests, Errors, Duration) and USE (Utilization, Saturation, Errors) isn't always enough. Latent errors might only surface during deployment or scaling.
For example, consider deploying a stateful PostgreSQL database via Flux on AWS. Problems could arise, including:
- Tools like
helm template
andhelm lint
can validate chart rendering and syntax, but they don't guarantee compatibility with a specific Kubernetes version or the operators running on the cluster. ct install
on akind
or simulated cluster can verify API compatibility and ensure all resources and operators work correctly in ideal conditions.- Deploying to a staging environment can help catch issues before they reach production, but this approach doesn't detect capacity, performance or latent errors that only surface under load.
Control plane testing can help improve resilience by continuously redeploying workloads, ensuring there is enough capacity within the system and that all operators and external dependencies are working correctly.
Canary checker is a kubernetes-native test platform that continuously runs tests using 30+ check styles against your workloads. In this tutorial, we use it to continuously verify the ability to provision and run stateful workloads in a cluster.
The kubernetesResource
check creates kubernetes resources based on the provided manifests & perform checks on them, it has 5 lifecycle stages:
Lifecycle
- Apply Static Resources
Applies all
staticResources
that are required for all tests to pass e.g. namespaces, secrets, etc.. - Apply Resources
Applies all the workloads defined in
resources
- Wait - Using the parameters defined in
waitFor
, wait for the resources to be ready using is-healthy - Run Checks - Run all the
checks
against the workloads - Cleanup - Delete all the
resources
that were created during the test.
Tutorial
Prerequisites
To follow this tutorial, you need:
- A Kubernetes cluster
- FluxCD installed
-
Define the workload under test
Before you can create a canary you should start with a working example of a resource, in this example we use a
HelmRelease
to deploy a postgres database.apiVersion: v1
kind: Namespace
metadata:
name: control-plane-tests
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: bitnami
namespace: control-plane-tests
spec:
type: oci
interval: 1h
url: oci://registry-1.docker.io/bitnamicharts
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: postgresql
spec:
chart:
spec:
chart: postgresql
sourceRef:
kind: HelmRepository
name: bitnami
namespace: control-plane-tests
version: "*"
interval: 1h
values:
auth:
database: my_database
password: qwerty123
username: admin
primary:
persistence:
enabled: true
size: 8GiOnce you have verified the helm release is working on its own, you can then begin building the control plane test using
canary-checker
. -
Install the
canary-checker
binary- Linux (amd64)
- Linux (arm64)
- MacOSX (amd64)
- MacOSX (arm64)
- Makefile
- Windows
wget https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_linux_amd64 \
-O /usr/bin/canary-checker && \
chmod +x /usr/bin/canary-checkerwget https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_linux_arm64 \
-O /usr/bin/canary-checker && \
chmod +x /usr/bin/canary-checkerwget https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_darwin_amd64 \
-O /usr/local/bin/canary-checker && \
chmod +x /usr/local/bin/canary-checkerwget https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_darwin_arm64 \
-O /usr/local/bin/canary-checker && \
chmod +x /usr/local/bin/canary-checkerOS = $(shell uname -s | tr '[:upper:]' '[:lower:]')
ARCH = $(shell uname -m | sed 's/x86_64/amd64/')
wget -nv -nc https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_$(OS)_$(ARCH) \\
-O /usr/local/bin/canary-checker && \\
chmod +x /usr/local/bin/canary-checkerwget -nv -nc -O https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker.exe
Helm InstallationThis tutorial uses the CLI for faster feedback, in production we recommend installing
canary-checker
as an operator using the helm chart or as part of the full Mission Control platform. -
Next create a
Canary
CustomResourceDefinition (CRD) using thekubernetesResource
check type, the layout of the canary is as follows:apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: control-plane-tests
namespace: control-plane-tests
spec:
# how often to run the test
schedule: "@every 1h"
kubernetesResource: # this is type of canary we are executing, canary-checker has many more
- name: helm-release-postgres-check
waitFor:
# The time to wait for the resources to be ready before considering the test a failure
timeout: 10m
staticResources:
- # A list of resources that should be created once only and re-used across multiple tests
resources:
- # A list of resources to be created every time the check runs
display:
# optional Go text template to display the results of the check
template: |+
Helm release created: {{ .health | toYAML }}Using the workload defined in step 1, the check definition is as follows:
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: control-plane-tests
namespace: control-plane-tests
spec:
schedule: "@every 1h"
kubernetesResource:
- name: helm-release-postgres-check
description: "Deploy postgresql via HelmRelease"
waitFor:
timeout: 1m
display:
template: |+
Helm release created: {{ .health | toYAML }}
staticResources:
- apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: bitnami
spec:
type: oci
interval: 1h
url: oci://registry-1.docker.io/bitnamicharts
resources:
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: postgresql
spec:
chart:
spec:
chart: postgresql
sourceRef:
kind: HelmRepository
name: bitnami
interval: 5m
values:
auth:
username: admin
password: qwerty123
database: exampledb
primary:
persistence:
enabled: true
size: 8Gi -
Run the test locally using
canary-checker run basic-canary.yaml
❯ canary-checker run basic-canary.yaml
18:01:52.745 INF (k8s) Using kubeconfig /Users/moshe/.kube/config
18:01:52.749 INF Checking basic-canary.yaml, 1 checks found
18:01:55.209 INF (control-plane-tests) HelmRelease/control-plane-tests/postgresql (created) +kustomized
18:02:21.072 INF (control-plane-tests.helm-release-postgres-check) PASS duration=28321 Helm release created:
control-plane-tests/HelmRelease/postgresql:
health: healthy
message: Helm install succeeded for release control-plane-tests/postgresql.v1 with chart postgresql@16.2.2
ready: true
status: InstallSucceeded
control-plane-tests/HelmRepository/bitnami:
health: unknown
ready: true
18:02:21.073 INF 1 passed, 0 failed in 28sAnd if you run
kubectl get events
you should see:❯ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
26m Normal ChartPullSucceeded helmchart/control-plane-tests-postgresql pulled 'postgresql' chart with version '16.2.2'
26m Normal Scheduled pod/postgresql-0 Successfully assigned control-plane-tests/postgresql-0 to ip-10-0-4-167.eu-west-1.compute.internal
26m Normal Pulled pod/postgresql-0 Container image "docker.io/bitnami/postgresql:17.2.0-debian-12-r0" already present on machine
26m Normal Created pod/postgresql-0 Created container postgresql
26m Normal Started pod/postgresql-0 Started container postgresql
26m Warning Unhealthy pod/postgresql-0 Readiness probe failed: 127.0.0.1:5432 - rejecting connections
26m Warning Unhealthy pod/postgresql-0 Readiness probe failed: 127.0.0.1:5432 - no response
26m Normal Killing pod/postgresql-0 Stopping container postgresql
113s Normal Scheduled pod/postgresql-0 Successfully assigned control-plane-tests/postgresql-0 to ip-10-0-4-167.eu-west-1.compute.internal
112s Normal Pulled pod/postgresql-0 Container image "docker.io/bitnami/postgresql:17.2.0-debian-12-r0" already present on machine
112s Normal Created pod/postgresql-0 Created container postgresql
112s Normal Started pod/postgresql-0 Started container postgresql
96s Normal Killing pod/postgresql-0 Stopping container postgresql
26m Normal HelmChartCreated helmrelease/postgresql Created HelmChart/control-plane-tests/control-plane-tests-postgresql with SourceRef 'HelmRepository/control-plane-tests/bitnami'
26m Normal SuccessfulCreate statefulset/postgresql create Pod postgresql-0 in StatefulSet postgresql successful
26m Normal InstallSucceeded helmrelease/postgresql Helm install succeeded for release control-plane-tests/postgresql.v1 with chart postgresql@16.2.2
26m Normal UninstallSucceeded helmrelease/postgresql Helm uninstall succeeded for release control-plane-tests/postgresql.v1 with chart postgresql@16.2.2
26m Normal HelmChartDeleted helmrelease/postgresql deleted HelmChart 'control-plane-tests/control-plane-tests-postgresql'
116s Normal HelmChartCreated helmrelease/postgresql Created HelmChart/control-plane-tests/control-plane-tests-postgresql with SourceRef 'HelmRepository/control-plane-tests/bitnami'
113s Normal SuccessfulCreate statefulset/postgresql create Pod postgresql-0 in StatefulSet postgresql successful
101s Normal InstallSucceeded helmrelease/postgresql Helm install succeeded for release control-plane-tests/postgresql.v1 with chart postgresql@16.2.2
96s Warning CalculateExpectedPodCountFailed poddisruptionbudget/postgresql Failed to calculate the number of expected pods: found no controllers for pod "postgresql-0"
96s Normal UninstallSucceeded helmrelease/postgresql Helm uninstall succeeded for release control-plane-tests/postgresql.v1 with chart postgresql@16.2.2
95s Normal HelmChartDeleted helmrelease/postgresql deleted HelmChart 'control-plane-tests/control-plane-tests-postgresql' -
Add custom check
By default
kubernetesResource
only checks if the resource is ready. However, you can add custom checks to validate the resource further.For example, you can validate the PostgreSQL database is running and accepting connections, with a custom
postgres
check:apiVersion: canaries.flanksource.com/v1
kind: Canary
#...
spec:
kubernetesResource:
- #...
checks:
- postgres:
- name: postgres schemas check
url: "postgres://$(username):$(password)@postgresql.default.svc:5432/exampledb?sslmode=disable"
username:
value: admin
password:
value: qwerty123
# Since we just want to check if database is responding,
# a SELECT 1 query should suffice
query: SELECT 1Accessing variablesThis example uses the
$(username)
and$(password)
syntax to access theusername
andpassword
variables hardcoded in thechecks
section, but in a production setting, reference secrets usingvalueFrom
Alternatives to custom checksInstead of using a custom check you can also add a standard helm test pod to your chart or define a canary inside the chart to automatically include health checks for all workloads.
-
The final test looks like:
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: control-plane-tests
namespace: control-plane-tests
spec:
schedule: "@every 1m"
kubernetesResource:
- name: helm-release-postgres-check
namespace: default
description: "Deploy postgresql via HelmRelease"
staticResources:
- apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: bitnami
spec:
type: oci
interval: 1h
url: oci://registry-1.docker.io/bitnamicharts
resources:
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: postgresql
namespace: default
spec:
chart:
spec:
chart: postgresql
sourceRef:
kind: HelmRepository
name: bitnami
namespace: control-plane-tests
interval: 5m
values:
auth:
username: admin
password: qwerty123
database: exampledb
primary:
persistence:
enabled: true
size: 8Gi
checks:
- postgres:
- name: postgres schemas check
url: "postgres://$(username):$(password)@postgresql.default.svc:5432/exampledb?sslmode=disable"
username:
value: admin
password:
value: qwerty123
# Since we just want to check if database is responding,
# a SELECT 1 query should suffice
query: SELECT 1
checkRetries:
delay: 15s
interval: 10s
timeout: 5m
Conclusion
Continuous testing of your control plane is essential for maintaining resilient infrastructure at scale. By implementing continuous testing with tools like Canary Checker, Flux, and Helm, you can:
- Catch breaking changes early
- Validate infrastructure changes
- Ensure security compliance
- Maintain platform stability
- Reduce incident recovery time
This proactive approach helps catch issues before they impact production environments and affect your users.