Playbook BasicsΒΆ

A playbook is an automation rule for detecting, investigating, or fixing problems in your cluster.

For a gentle introduction, see What are Playbooks?

OverviewΒΆ

Every playbook consists of a condition (trigger) and instructions (actions) defining the response.

Playbooks behave like pipelines:

  1. Events come into Robusta and are checked against triggers.

  2. When there is a match, a trigger fires

  3. The relevant playbook runs

  4. All playbook actions execute, receiving the event as context

  5. If notifications were generated by the playbook, they are sent to sinks.

Defining Custom PlaybooksΒΆ

Using a custom playbook, we can get notified in Slack whenever a Pod's Liveness probe fails.

Use the customPlaybooks Helm value:

customPlaybooks:
- triggers:
    - on_kubernetes_warning_event_create:
        include: ["Liveness"]   # fires on failed Liveness probes
  actions:
    - create_finding:
        severity: HIGH
        title: "Failed liveness probe: $name"
    - event_resource_events: {}

Perform a Helm Upgrade to apply the custom playbook.

Next time a Liveness probe fails, you will get notified.

Apply the following command the simulate a failing liveness probe.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml

Let's explore each part of the above playbook in depth.

Modifying Default PlaybooksΒΆ

By default, Robusta has a default set of playbooks configured. These are used to create notifications for all common Kubernetes issues and Prometheus alerts.

You can disable any of the default playbooks, or change the configuration of a given playbook.

In order to disable a default playbook, add the playbook name to the disabledPlayooks helm value (Playbook name is in the name attribute of each playbook)

For example, to disable the ImagePullBackOff playbook, use:

disabledPlaybooks:
- ImagePullBackOff

In order to override the default configuration of the same playbook, both disable it, and add it to customPlaybooks with the override configuration:

disabledPlaybooks:
- ImagePullBackOff

customPlaybooks:
- name: "CustomImagePullBackOff"
  triggers:
  - on_image_pull_backoff:
      fire_delay: 300  # fire only if failing to pull the image for 5 min
  actions:
  - image_pull_backoff_reporter: {}

Organizing PlaybooksΒΆ

Using namedCustomPlaybooks, you can define playbooks by name. This is useful when you want to define a base set of playbooks for all clusters/teams and then use additional Helm values files to override some of the base playbooks or add new ones.

They are all merged together into a single playbooks list. This allows you to split away the custom playbooks from generated_values.yaml to separate files and organize your playbooks.

First, add the custom playbooks as a dictionary into a file named app_a_playbooks.yaml as shown below:

namedCustomPlaybooks:
team-a-app-a:
  - triggers:
      - on_prometheus_alert:
          namespace_prefix: "app-a"
    actions:
      - create_finding:
          aggregation_key: "This is app-a - Requires your attention"
          severity: HIGH
          title: "Check app-a out"
          description: "@monitoring.monitoring this is for you"
team-b-app-b:
  - triggers:
      - on_prometheus_alert:
          namespace_prefix: "app-b"
    actions:
      # Actions for team-b-app-b here

Then run a Helm upgrade by passing the new file using the -f flag.

helm ugprade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml

Understanding TriggersΒΆ

Triggers are event-driven, firing at specific moments when something occurs in your cluster. Even a Kubernetes cluster doing nothing generates a constant stream of events. Using triggers, you can find and react to the events that matter.

Going back to the above example, we saw the trigger on_kubernetes_warning_event_create. Breaking down the name, you'll notice the format on_<resource_type>_<operation>. This is a general pattern. on_kubernetes_warning_event_create fires when new Warning Events (kubectl get events --all-namespaces --field-selector type=Warning) are created.

The trigger also had an include filter, limiting which Warning Events cause the playbook to run. In this case its a Liveness probe event. See each trigger's documentation to learn which filters are supported.

Common TriggersΒΆ

Popular triggers include:

All triggers can be found under Triggers Reference.

Understanding ActionsΒΆ

Actions perform tasks in response to triggers, such as collecting information, investigating issues, or fixing problems.

In the above example, there were two actions. When playbooks contain multiple actions, they are executed in order:

  • create_finding - this generates the notification message

  • event_resource_events - this is a specific action for on_kubernetes_warning_event_create which attaches relevant events to the notification

The latter action has a funny name, which reflects that it takes a Kubernetes Warning Event as input, finds the related Kubernetes resource (e.g. a Pod), and then fetches all the related Kubernetes Warning Events for that resource.

Actions, Enrichers, and Silencers

Many actions in Robusta were written for a specific purpose, like enriching alerts or silencing them.

By convention, these actions are called enrichers and silencers, but those names are just convention.

Under the hood, enrichers and silencers are plain old actions, nothing more.

Common ActionsΒΆ

Popular actions include:

All actions can be found under Actions Reference.

Understanding NotificationsΒΆ

In Robusta, notifications are called Findings, as they represent something the playbook discovered.

In the above example, a Finding was generated by the create_finding action. Refer to Creating Notifications for more details.

Matching Actions to TriggersΒΆ

Triggers output typed events when they fire. For example:

  • The on_prometheus_alert trigger outputs a PrometheusAlert event

  • The on_pod_update trigger outputs a PodChangeEvent event

Each action is compatible with a subset of event types.

For instance, logs_enricher requires an event with a Pod object, such as PrometheusAlert, PodEvent, or PodChangeEvent.

Refer to docs for each action , to see supported events.