Skip to main content

Inhibition

When something breaks in your infrastructure, it rarely breaks alone. A crashing pod makes its ReplicaSet unhealthy, which makes its Deployment unhealthy — and one root cause turns into three notifications.

Inhibition lets you keep the notification that points closest to the root cause and automatically suppress the related notifications that follow it.

How it works

An inhibition rule has two sides:

  • from — the config type whose notification you want to keep (the inhibitor)
  • to — the related config types whose notifications you want to suppress

Once a notification is sent for a from resource, it starts inhibiting. For the length of the notification's repeatInterval, any new event for a related to resource is recorded as inhibited instead of being delivered.

Walking through the pod example:

  1. A pod crashes and a config.unhealthy notification for it is sent. The rule below lists Kubernetes::Pod in from, so this notification becomes an inhibitor.
  2. Moments later, the pod's ReplicaSet and Deployment also turn unhealthy. Their types are listed in to, so Mission Control walks the relationship graph from each of them, finds the pod that already notified, and suppresses both.
  3. You receive one notification — the pod alert — instead of three.
inhibitions:
- direction: incoming
from: Kubernetes::Pod
to:
- Kubernetes::ReplicaSet
- Kubernetes::Deployment
Things to keep in mind
  • Inhibition requires repeatInterval on the notification — it doubles as the inhibition window. Without it, inhibition rules are ignored.
  • Both the kept and the suppressed alerts must come from the same Notification resource, so the notification's events and filter must match all the resource types involved.
  • Inhibition works on catalog (config) events such as config.unhealthy — not on check or component events.
  • Order matters: only an already-sent from notification can inhibit. If the Deployment's alert happens to arrive before the Pod's, both are sent.
  • Inhibited notifications aren't lost — they appear in the notification send history with the status inhibited.

Writing your own rule

  1. Pick the alert to keep. Choose the resource type that gives the clearest signal about the root cause — that's your from. For Kubernetes roll-up health, that's usually the Pod.
  2. List the noise. The related types whose alerts repeat the same information go in to.
  3. Choose a direction. Ask where the to resources sit relative to from in the relationship graph:
    • They're parents or owners (Pod → its ReplicaSet/Deployment): use incoming.
    • They're children or dependents (Node → its Pods): use outgoing.
    • Could be either: use all.
  4. Count the hops and set depth. Each relationship level is one hop: Pod → ReplicaSet is 1, Pod → ReplicaSet → Deployment is 2. Defaults to 5 when omitted.
  5. Set soft: true for soft relationships. Ownership links like Deployment → Pod are hard relationships and match by default. Placement links like Node → Pod are soft, and are only followed when soft: true.

Examples

Keep the Pod alert, suppress its ReplicaSet and Deployment

A pod's failure usually explains why its parents are unhealthy, so this notification keeps the pod alert and inhibits the parent alerts that follow within the 4-hour window. The direction is incoming because ReplicaSets and Deployments are parents of the pod, and depth: 2 covers the two hops from Pod up to Deployment.

deployment-with-inhibition.yaml
apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: pod-with-incoming-inhibition
spec:
events:
- config.unhealthy
- config.warning
repeatInterval: 4h
to:
connection: connection://mission-control/slack
inhibitions:
- direction: incoming
from: Kubernetes::Pod
to:
- Kubernetes::Deployment
- Kubernetes::ReplicaSet
depth: 2

How this plays out:

TimeResourceEventAction
10:00Pod api-7d9fconfig.unhealthyNotification sent (becomes the inhibitor)
10:01ReplicaSet api-7d9fconfig.unhealthyInhibited (related pod already notified)
10:02Deployment apiconfig.unhealthyInhibited (related pod already notified)
15:30Deployment apiconfig.unhealthyNotification sent (4h window expired)

Keep the Node alert, suppress its Pods

When a node goes down, every pod scheduled on it raises an alert. This notification keeps the node alert and inhibits the pod alerts. The direction is outgoing because the pods sit below the node, and soft: true is required because Node-to-Pod is a soft relationship.

node-with-inhibition.yaml
apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: node-with-pod-inhibition
spec:
events:
- config.unhealthy
- config.warning
repeatInterval: 4h
to:
connection: connection://mission-control/slack
inhibitions:
- direction: outgoing
from: Kubernetes::Node
to:
- Kubernetes::Pod
soft: true
depth: 1

Fields

FieldDescriptionScheme
direction*

Relationship direction from from to to. Use outgoing when to resources are downstream or child resources, incoming when to resources are upstream or parent resources, and all to check both directions.

incoming | outgoing | all

from*

Config type whose sent notification can inhibit notifications for related to resources. For example, Kubernetes::Deployment.

string

to*

Config types that can be inhibited when they are related to a from resource that already sent this notification within the repeatInterval window.

[]string

depth

Maximum number of relationship levels to traverse. Defaults to 5 when omitted.

integer

soft

When false, only hard relationships are considered. When true, both hard and soft relationships are considered. For example, Deployment to Pod is a hard relationship, but Node to Pod is a soft relationship.

boolean