Prometheus Alert Enrichment¶

Introduction¶

Robusta has special features for handling Prometheus alerts in Kubernetes clusters including:

Enrichers: actions that enrich alerts with extra information based on the alert type
Silencers: actions that silence noisy alerts using more advanced methods than Prometheus/AlertManager's builtin silencing feature

When trying out these features, you can leave your existing alerting Slack channel in place and add a new channel for Robusta's improved Prometheus alerts. This will let you compare Robusta's alerting with Prometheus' builtin alerting.

Each triggered action will add enrichment data to the finding. After all the triggered actions are executed, the findings and enrichments will be sent to the configured sinks.

Configure Robusta¶

Configure Prometheus AlertManager

Before you can enrich prometheus alerts, you must forward Prometheus alerts to Robusta by adding a webhook receiver to AlertsManager.

See Prometheus for details.

Lets look at the simplest possible configuration in values.yaml which instructs Robusta to forward Prometheus alerts without any special enrichment:

builtinPlaybooks:
- triggers:
  - on_prometheus_alert: {}
  actions:
  - default_enricher: {}

The above configuration just forward prometheus alerts to the configured sinks. We didn't add any special enrichment yet. Below you can see how the default alert information looks in Slack:

Adding an Enricher¶

Now lets add an enricher to values.yaml which enriches the HostHighCPULoad alert:

builtinPlaybooks:
- triggers:
  - on_prometheus_alert:
      alert_name: HostHighCpuLoad
  actions:
  - node_cpu_enricher: {}
- triggers:
  - on_prometheus_alert: {}
  actions:
  - default_enricher: {}

When using the above yaml, all prometheus alerts are forwarded to the sinks unmodified except for the HostHighCPULoad alert which is enriched as you can see below.

Note that adding an enricher to a specific alert, doesn't stop other enrichers from running. Enrichers will run by the order they appear in the values file.

It's highly recommended to always leave the default_enricher last, to add the default information to all alerts.

Analysis of node cpu usage, breakdown by pods

Make sure to check out the full list of enrichers to see what you can add.

If for some reason, you would like to stop processing after some enricher, you can use the stop playbook parameter:

builtinPlaybooks:
- triggers:
  - on_prometheus_alert:
      alert_name: HostHighCpuLoad
  actions:
  - node_cpu_enricher: {}
  stop: True
- triggers:
  - on_prometheus_alert: {}
  actions:
  - default_enricher: {}

Using this configuration, the HostHighCpuLoad alert, will not include the default alert information.

Adding a Silencer¶

Lets silence KubePodCrashLooping alerts in the first ten minutes after a node (re)starts:

builtinPlaybooks:
- triggers:
  - on_prometheus_alert:
      alert_name: KubePodCrashLooping
  actions:
  - node_restart_silencer:
      post_restart_silence: 600 # seconds
- triggers:
  - on_prometheus_alert: {}
  actions:
  - default_enricher: {}

Full example¶

Here are all the above features working together:

builtinPlaybooks:
- triggers:
  - on_prometheus_alert:
      alert_name: KubePodCrashLooping
  actions:
  - node_restart_silencer:
      post_restart_silence: 600 # seconds
- triggers:
  - on_prometheus_alert:
      alert_name: HostHighCpuLoad
  actions:
  - node_cpu_enricher: {}
- triggers:
  - on_prometheus_alert: {}
  actions:
  - default_enricher: {}

Prometheus Alert Enrichment¶

Introduction¶

Configure Robusta¶

Adding an Enricher¶

Adding a Silencer¶

Full example¶

Available Enrichers¶

General Enrichers¶

Default enricher¶

Template enricher¶

Logs enricher¶

Alert definition enricher¶

Graph enricher¶

Node Enrichers¶

Node cpu enricher¶

Oom killer enricher¶

Node status enricher¶

Node running pods enricher¶

Node allocatable resources enricher¶

Node bash enricher¶

Pod Enrichers¶

Pod bash enricher¶

Cpu throttling analysis enricher¶

Image pull backoff reporter¶

Pod events enricher¶

Daemonset Enrichers¶

Daemonset status enricher¶

Daemonset misscheduled analysis enricher¶

Deployment Enrichers¶

Deployment status enricher¶

Other Enrichers¶

Stack overflow enricher¶

Show stackoverflow search¶

Available Silencers¶

Severity silencer¶

Name silencer¶

Node restart silencer¶

Daemonset misscheduled smart silencer¶