Skip to main content

Prometheus

Health Check

Mission Control integrates with Prometheus to monitor your metrics infrastructure.


Health Check

Use cases:

  • Execute PromQL queries against Prometheus endpoints and validate results
  • Alert when metric thresholds are breached or queries return unexpected values
  • Monitor Prometheus availability and query performance
  • Validate SLO/SLI metrics are within acceptable ranges
  • Detect anomalies by checking metric values against expected baselines

Example

Run PromQL queries against Prometheus and validate the results match expected criteria.

prometheus-health-check.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: prometheus-check
spec:
interval: 60
prometheus:
- name: api-error-rate
url: http://prometheus:9090
query: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
test:
expr: results[0] < 1 # Less than 1% error rate
Query with Labels
prometheus-labels-check.yaml
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: prometheus-labels
spec:
interval: 30
prometheus:
- name: node-memory
url: http://prometheus:9090
query: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
test:
expr: results[0] > 20 # At least 20% memory available

Exposed Metrics

Mission Control exposes Prometheus metrics for monitoring health checks, notifications, and configuration scraping.

ServiceMonitor Setup

Enable Prometheus scraping via ServiceMonitor:

helm install mission-control flanksource/mission-control \
--set serviceMonitor=true

Health Check Metrics

MetricTypeDescription
canary_checkGaugePass (1) or fail (0) for each check
canary_check_success_countCounterTotal successful check executions
canary_check_failed_countCounterTotal failed check executions
canary_check_infoGaugeCheck metadata (name, namespace, type)
canary_check_durationHistogramCheck execution duration in seconds

Labels: canary_name, key, name, namespace, owner, severity, type

Notification Metrics

MetricTypeDescription
notification_sent_totalCounterNotifications sent by recipient
notification_send_error_totalCounterFailed notification attempts
notification_send_duration_secondsHistogramNotification delivery latency

Scraper Metrics

MetricTypeDescription
scraper_addedCounterConfig items added per scrape
scraper_updatedCounterConfig items updated per scrape
scraper_deletedCounterConfig items deleted per scrape
config_changesCounterChanges detected by type
temp_cache_hitCounterCache lookups found
temp_cache_missCounterCache lookups requiring DB query
kubernetes_informer_eventsCounterKubernetes watch events received
incremental_scrape_eventHistogramIncremental scrape duration

Labels: scraper_id, kind, config_type

For the complete metrics reference, see the Canary Checker metrics documentation.

Next Steps