Skip to main content

Kubernetes

The Kubernetes check performs requests on Kubernetes resources such as Pods to get the desired information.

kubernetes.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: kube-system-checks
spec:
schedule: "@every 5m"
kubernetes:
- name: kube-system
kind: Pod
healthy: true
# resource:
# search: labels.app=test
# OR
# labelSelector: k8s-app=kube-dns
namespaceSelector:
name: kube-*,!*lease
# name: "*"
display:
expr: |
dyn(results).
map(i, i.Object).
filter(i, !k8s.isHealthy(i)).
map(i, "%s/%s -> %s".format([i.metadata.namespace, i.metadata.name, k8s.getHealth(i).message])).join('\n')
test:
expr: dyn(results).all(x, k8s.isHealthy(x))
FieldDescriptionScheme
kind*

Kubernetes object kind

string

name*

Name of the check, must be unique within the canary

string

cnrm

CNRM connection details

CNRM

connection

The connection url to use, mutually exclusive with kubeconfig

Connection

eks

EKS connection details

EKS

gke

GKE connection details

GKE

healthy

Fail the check if any resources are unhealthy

boolean

ignore

Ignore the specified resources from the fetched resources. Can be a glob pattern.

[]glob

kubeconfig

Source for kubeconfig

EnvVar

namespace

Failing checks are placed in this namespace, useful if you have shared namespaces. NOTE: this does not change the namespace of the resources being queried

namespaceSelector

Filters namespaces by name or labels

Resource Selector

ready

Fail the check if any resources are not ready

boolean

resource

Filters resources by name, namespace, or labels

Resource Selector

description

Description for the check

string

display

Expression to change the formatting of the display

Expression

icon

Icon for overwriting default icon on the dashboard

Icon

labels

Labels for check

[map[string]string]

markFailOnEmpty

If a transformation or datasource returns empty results, the check should fail

boolean

metrics

Metrics to export from

[]Metrics

test

Evaluate whether a check is healthy

Expression

transform

Transform data from a check into multiple individual checks

Expression

Resource Selector

Resource Selectors are used throughout Mission Control for:

  • Creating relationships between configs and configs/components
  • Filtering resources in playbook triggers and actions
  • Selecting targets for health checks
  • Building dynamic views and dashboards
FieldDescriptionSchemeRequired
idSelect resource by ID. Supports comma-separated values and wildcards (id=abc*,def*)string
nameSelect resource by name. Supports comma-separated values and wildcards (name=*-prod,*-staging)string
namespaceSelect resources in this namespace only. If empty, selects from all namespacesstring
typesSelect resources matching any of the specified types (e.g., Kubernetes::Pod, AWS::EC2::Instance)[]string
statusesSelect resources matching any of the specified statuses[]string
healthSelect resources matching the specified health status. Supports multiple values separated by comma (healthy,warning) and negation (!unhealthy)string
scopeLimit selection to resources belonging to a specific parent. For configs this is the scraper id, for checks it's the canary, and for components it's the topology. Can be a UUID or namespace/namestring
labelSelectorKubernetes-style label selector. Supports =, ==, != operators and set-based selectors (key in (v1,v2), key notin (v1,v2), key, !key)LabelSelector
fieldSelectorSelect resources by property fields using Kubernetes field selector syntax. Supports fields like owner, topology_id, parent_id for componentsFieldSelector
tagSelectorSelect resources by tags using the same syntax as labelSelector. Tags are key-value pairs assigned during scrapingstring
agentSelect resources created on a specific agent. Accepts agent UUID, agent name, or special values: local (resources without an agent), self (alias for local), all (resources from any agent). Defaults to localstring
cacheCache settings for selector results. Useful for expensive or frequently-used selectors. Values: no-cache (bypass but allow caching), no-store (bypass and don't cache), max-age=<duration> (cache for duration)string
limitMaximum number of resources to returnint
includeDeletedInclude soft-deleted resources in results. Defaults to falsebool
searchFull-text search across resource name, tags, and labels using parsing expression grammar. See Searchstring

Wildcards and Negation

The name, id, types, statuses, and health fields support:

  • Prefix matching: name=prod-* matches names starting with prod-
  • Suffix matching: name=*-backend matches names ending with -backend
  • Negation: health=!unhealthy excludes unhealthy resources
  • Multiple values: types=Kubernetes::Pod,Kubernetes::Deployment matches either type

The search field provides a powerful query language for filtering resources.

Syntax

field1=value1 field2>value2 field3=value3* field4=*value4

Multiple conditions are combined with AND logic.

Operators

OperatorExampleDescriptionTypes
=status=healthyEquals (exact match or wildcard)string int json
!=health!=unhealthyNot equalsstring int json
=*name=*-prod or name=api-*Prefix or suffix matchstring int
>created_at>now-24hGreater thandatetime int
<updated_at<2025-01-01Less thandatetime int

Date Queries

  • Absolute dates: 2025-01-15, 2025-01-15T10:30:00Z
  • Relative dates: now-24h, now-7d, now+1w
  • Supported units: s (seconds), m (minutes), h (hours), d (days), w (weeks), y (years)

JSON Field Access

Access nested fields in labels, tags, and config using dot notation:

labels.app=nginx
tags.env=production
config.spec.replicas>3

Searchable Fields

Catalog Items (Configs)

FieldTypeDescription
namestringResource name
namespacestringKubernetes namespace or equivalent
typestringResource type (e.g., Kubernetes::Pod)
statusstringCurrent status
healthstringHealth status
sourcestringSource identifier
agentstringAgent that scraped this resource
labelsjsonKubernetes-style labels
tagsjsonScraper-assigned tags
configjsonFull configuration data
created_atdatetimeCreation timestamp
updated_atdatetimeLast update timestamp
deleted_atdatetimeSoft deletion timestamp

Config Changes

FieldTypeDescription
idstringChange ID
config_idstringParent config ID
namestringChange name
typestringConfig type
change_typestringType of change (e.g., diff, event)
severitystringChange severity
summarystringChange summary
countintOccurrence count
agent_idstringAgent ID
tagsjsonChange tags
detailsjsonAdditional details
created_atdatetimeChange timestamp
first_observeddatetimeFirst observation time

Examples

Basic Selection

# Select by exact name
name: my-deployment

# Select by ID
id: 3b1a2c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d

# Select all pods in a namespace
types:
- Kubernetes::Pod
namespace: production

Using Wildcards

# Select all resources with names starting with "prod-"
name: prod-*

# Select all AWS resources
types:
- AWS::*

# Select resources ending with "-backend"
name: "*-backend"

Label and Tag Selectors

# Select by labels (Kubernetes-style)
labelSelector: app=nginx,env in (prod,staging)

# Select by tags
tagSelector: team=platform,cost-center!=shared

# Combine both
labelSelector: app=api
tagSelector: environment=production

Health and Status Filtering

# Select only healthy resources
health: healthy

# Exclude unhealthy resources
health: "!unhealthy"

# Select resources with specific statuses
statuses:
- Running
- Pending

Search Queries

# Find all Kubernetes namespaces starting with "kube"
search: type=Kubernetes::Namespace name=kube*

# Find unhealthy AWS EC2 instances
search: type=AWS::EC2::Instance health=unhealthy

# Find configs created in the last 24 hours
search: created_at>now-24h

# Find nginx pods with specific tags
search: type=Kubernetes::Pod labels.app=nginx tags.cluster=prod

# Complex query with date range
search: updated_at>2025-01-01 updated_at<2025-01-31 type=Kubernetes::Deployment

Multi-Agent Selection

# Select from a specific agent
agent: production-cluster

# Select from all agents
agent: all

# Select only local (agentless) resources
agent: local

Scoped Selection

# Select configs from a specific scraper
scope: namespace/my-scraper

# Select checks from a specific canary
scope: canary-uuid-here
catalog-pod-check.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: k8s-checks
spec:
schedule: '@every 30s'
kubernetes:
- name: notification-pod-health-check
selector:
- labelSelector: 'kubernetes.io/app=notification-listener'
types:
- Kubernetes::Pod
test:
expr: dyn(results).all(x, k8s.isHealthy(x))

Healthy

Using healthy: true is functionally equivalent to:

  test:
expr: dyn(results).all(x, k8s.isHealthy(x))
kubnetes-healthy.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: kube-system-checks
spec:
interval: 30
kubernetes:
- namespace: kube-system
name: kube-system
kind: Pod
healthy: true
resource:
labelSelector: k8s-app=kube-dns
namespaceSelector:
name: kube-system

See the CEL function k8s.isHealthy for more details

Ready

Similar to the healthy flag, there's also a ready flag which is functionally equivalent to having the following test expression

dyn(results).all(x, k8s.isReady(x))

Checking for certificate readiness
cert-manager.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: cert-manager
spec:
schedule: "@every 15m"
kubernetes:
- name: cert-manager-check
kind: Certificate
test:
expr: |
dyn(results).
map(i, i.Object).
filter(i, i.status.conditions[0].status != "True").size() == 0
display:
expr: |
dyn(results).
map(i, i.Object).
filter(i, i.status.conditions[0].status != "True").
map(i, "%s/%s -> %s".format([i.metadata.namespace, i.metadata.name, i.status.conditions[0].message])).join('\n')

Remote clusters

A single canary-checker instance can connect to any number of remote clusters via custom kubeconfig. Either the kubeconfig itself or the path to the kubeconfig can be provided.

kubeconfig from kubernetes secret

remote-cluster.yaml
---
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: pod-access-check
spec:
schedule: "@every 5m"
kubernetes:
- name: pod access on aws cluster
namespace: default
description: "deploy httpbin"
kubeconfig:
valueFrom:
secretKeyRef:
name: aws-kubeconfig
key: kubeconfig
kind: Pod
ready: true
namespaceSelector:
name: default

Kubeconfig inline

remote-cluster.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: pod-access-check
spec:
schedule: "@every 5m"
kubernetes:
- name: pod access on aws cluster
namespace: default
kubeconfig:
value: |
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: xxxxx
server: https://xxxxx.sk1.eu-west-1.eks.amazonaws.com
name: arn:aws:eks:eu-west-1:765618022540:cluster/aws-cluster
contexts:
- context:
cluster: arn:aws:eks:eu-west-1:765618022540:cluster/aws-cluster
namespace: mission-control
user: arn:aws:eks:eu-west-1:765618022540:cluster/aws-cluster
name: arn:aws:eks:eu-west-1:765618022540:cluster/aws-cluster
current-context: arn:aws:eks:eu-west-1:765618022540:cluster/aws-cluster
kind: Config
preferences: {}
users:
- name: arn:aws:eks:eu-west-1:765618022540:cluster/aws-cluster
user:
exec:
....
kind: Pod
ready: true
namespaceSelector:
name: default

Kubeconfig from local filesystem

remote-cluster.yaml
---
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: pod-access-check
spec:
schedule: "@every 5m"
kubernetes:
- name: pod access on aws cluster
namespace: default
kubeconfig:
value: /root/.kube/aws-kubeconfig
kind: Pod
ready: true
namespaceSelector:
name: default