Prometheus

Health Check

Mission Control integrates with Prometheus to monitor your metrics infrastructure.

Health Check

Use cases:

Execute PromQL queries against Prometheus endpoints and validate results
Alert when metric thresholds are breached or queries return unexpected values
Monitor Prometheus availability and query performance
Validate SLO/SLI metrics are within acceptable ranges
Detect anomalies by checking metric values against expected baselines

Example

Run PromQL queries against Prometheus and validate the results match expected criteria.

prometheus-health-check.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: prometheus-check
spec:
  interval: 60
  prometheus:
    - name: api-error-rate
      url: http://prometheus:9090
      query: |
        sum(rate(http_requests_total{status=~"5.."}[5m]))
        / sum(rate(http_requests_total[5m])) * 100
      test:
        expr: results[0] < 1  # Less than 1% error rate

Query with Labels

prometheus-labels-check.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: prometheus-labels
spec:
  interval: 30
  prometheus:
    - name: node-memory
      url: http://prometheus:9090
      query: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
      test:
        expr: results[0] > 20  # At least 20% memory available

Exposed Metrics

Mission Control exposes Prometheus metrics for monitoring health checks, notifications, and configuration scraping.

ServiceMonitor Setup

Enable Prometheus scraping via ServiceMonitor:

helm install mission-control flanksource/mission-control \
  --set serviceMonitor=true

Health Check Metrics

Metric	Type	Description
`canary_check`	Gauge	Pass (1) or fail (0) for each check
`canary_check_success_count`	Counter	Total successful check executions
`canary_check_failed_count`	Counter	Total failed check executions
`canary_check_info`	Gauge	Check metadata (name, namespace, type)
`canary_check_duration`	Histogram	Check execution duration in seconds

Labels: canary_name, key, name, namespace, owner, severity, type

Notification Metrics

Metric	Type	Description
`notification_sent_total`	Counter	Notifications sent by recipient
`notification_send_error_total`	Counter	Failed notification attempts
`notification_send_duration_seconds`	Histogram	Notification delivery latency

Scraper Metrics

Metric	Type	Description
`scraper_added`	Counter	Config items added per scrape
`scraper_updated`	Counter	Config items updated per scrape
`scraper_deleted`	Counter	Config items deleted per scrape
`config_changes`	Counter	Changes detected by type
`temp_cache_hit`	Counter	Cache lookups found
`temp_cache_miss`	Counter	Cache lookups requiring DB query
`kubernetes_informer_events`	Counter	Kubernetes watch events received
`incremental_scrape_event`	Histogram	Incremental scrape duration

Labels: scraper_id, kind, config_type

For the complete metrics reference, see the Canary Checker metrics documentation.

Next Steps

Grafana Dashboards

Pre-built dashboards for visualizing these metrics

HTTP Health Checks

Monitor HTTP endpoints and APIs

Health Check

Example​

Exposed Metrics​

ServiceMonitor Setup​

Health Check Metrics​

Notification Metrics​

Scraper Metrics​

Next Steps​

Example

Exposed Metrics

ServiceMonitor Setup

Health Check Metrics

Notification Metrics

Scraper Metrics

Next Steps