Prometheus
Health Check
Mission Control integrates with Prometheus to monitor your metrics infrastructure.
Health Check
Use cases:
- Execute PromQL queries against Prometheus endpoints and validate results
- Alert when metric thresholds are breached or queries return unexpected values
- Monitor Prometheus availability and query performance
- Validate SLO/SLI metrics are within acceptable ranges
- Detect anomalies by checking metric values against expected baselines
Example
Run PromQL queries against Prometheus and validate the results match expected criteria.
prometheus-health-check.yamlapiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: prometheus-check
spec:
interval: 60
prometheus:
- name: api-error-rate
url: http://prometheus:9090
query: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
test:
expr: results[0] < 1 # Less than 1% error rate
Query with Labels
prometheus-labels-check.yamlapiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: prometheus-labels
spec:
interval: 30
prometheus:
- name: node-memory
url: http://prometheus:9090
query: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
test:
expr: results[0] > 20 # At least 20% memory available
Exposed Metrics
Mission Control exposes Prometheus metrics for monitoring health checks, notifications, and configuration scraping.
ServiceMonitor Setup
Enable Prometheus scraping via ServiceMonitor:
helm install mission-control flanksource/mission-control \
--set serviceMonitor=true
Health Check Metrics
| Metric | Type | Description |
|---|---|---|
canary_check | Gauge | Pass (1) or fail (0) for each check |
canary_check_success_count | Counter | Total successful check executions |
canary_check_failed_count | Counter | Total failed check executions |
canary_check_info | Gauge | Check metadata (name, namespace, type) |
canary_check_duration | Histogram | Check execution duration in seconds |
Labels: canary_name, key, name, namespace, owner, severity, type
Notification Metrics
| Metric | Type | Description |
|---|---|---|
notification_sent_total | Counter | Notifications sent by recipient |
notification_send_error_total | Counter | Failed notification attempts |
notification_send_duration_seconds | Histogram | Notification delivery latency |
Scraper Metrics
| Metric | Type | Description |
|---|---|---|
scraper_added | Counter | Config items added per scrape |
scraper_updated | Counter | Config items updated per scrape |
scraper_deleted | Counter | Config items deleted per scrape |
config_changes | Counter | Changes detected by type |
temp_cache_hit | Counter | Cache lookups found |
temp_cache_miss | Counter | Cache lookups requiring DB query |
kubernetes_informer_events | Counter | Kubernetes watch events received |
incremental_scrape_event | Histogram | Incremental scrape duration |
Labels: scraper_id, kind, config_type
For the complete metrics reference, see the Canary Checker metrics documentation.