Playbook Basics¶

A playbook is an automation rule for detecting, investigating, or fixing problems in your cluster.

For a gentle introduction, see What are Playbooks?

Overview¶

Every playbook consists of a condition (trigger) and instructions (actions) defining the response.

Playbooks behave like pipelines:

Events come into Robusta and are checked against triggers.
When there is a match, a trigger fires
The relevant playbook runs
All playbook actions execute, receiving the event as context
If notifications were generated by the playbook, they are sent to sinks.

Defining Custom Playbooks¶

Using a custom playbook, we can get notified in Slack whenever a Pod's Liveness probe fails.

Use the customPlaybooks Helm value:

customPlaybooks:
- triggers:
    - on_kubernetes_warning_event_create:
        include: ["Liveness"]   # fires on failed Liveness probes
  actions:
    - create_finding:
        severity: HIGH
        title: "Failed liveness probe: $name"
    - event_resource_events: {}

Perform a Helm Upgrade to apply the custom playbook.

Next time a Liveness probe fails, you will get notified.

Apply the following command the simulate a failing liveness probe.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml

Let's explore each part of the above playbook in depth.

Modifying Default Playbooks¶

By default, Robusta has a default set of playbooks configured. These are used to create notifications for all common Kubernetes issues and Prometheus alerts.

You can disable any of the default playbooks, or change the configuration of a given playbook.

In order to disable a default playbook, add the playbook name to the disabledPlayooks helm value (Playbook name is in the name attribute of each playbook)

For example, to disable the ImagePullBackOff playbook, use:

disabledPlaybooks:
- ImagePullBackOff

In order to override the default configuration of the same playbook, both disable it, and add it to customPlaybooks with the override configuration:

disabledPlaybooks:
- ImagePullBackOff

customPlaybooks:
- name: "CustomImagePullBackOff"
  triggers:
  - on_image_pull_backoff:
      fire_delay: 300  # fire only if failing to pull the image for 5 min
  actions:
  - image_pull_backoff_reporter: {}

Organizing Playbooks¶

Using namedCustomPlaybooks, you can define playbooks by name. This is useful when you want to define a base set of playbooks for all clusters/teams and then use additional Helm values files to override some of the base playbooks or add new ones.

They are all merged together into a single playbooks list. This allows you to split away the custom playbooks from generated_values.yaml to separate files and organize your playbooks.

First, add the custom playbooks as a dictionary into a file named app_a_playbooks.yaml as shown below:

namedCustomPlaybooks:
team-a-app-a:
  - triggers:
      - on_prometheus_alert:
          namespace_prefix: "app-a"
    actions:
      - create_finding:
          aggregation_key: "This is app-a - Requires your attention"
          severity: HIGH
          title: "Check app-a out"
          description: "@monitoring.monitoring this is for you"
team-b-app-b:
  - triggers:
      - on_prometheus_alert:
          namespace_prefix: "app-b"
    actions:
      # Actions for team-b-app-b here

Then run a Helm upgrade by passing the new file using the -f flag.

helm ugprade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml

Understanding Triggers¶

Triggers are event-driven, firing at specific moments when something occurs in your cluster. Even a Kubernetes cluster doing nothing generates a constant stream of events. Using triggers, you can find and react to the events that matter.

Going back to the above example, we saw the trigger on_kubernetes_warning_event_create. Breaking down the name, you'll notice the format on_<resource_type>_<operation>. This is a general pattern. on_kubernetes_warning_event_create fires when new Warning Events (kubectl get events --all-namespaces --field-selector type=Warning) are created.

The trigger also had an include filter, limiting which Warning Events cause the playbook to run. In this case its a Liveness probe event. See each trigger's documentation to learn which filters are supported.

Common Triggers¶

Popular triggers include:

All triggers can be found under Triggers Reference.

Understanding Actions¶

Actions perform tasks in response to triggers, such as collecting information, investigating issues, or fixing problems.

In the above example, there were two actions. When playbooks contain multiple actions, they are executed in order:

create_finding - this generates the notification message
event_resource_events - this is a specific action for on_kubernetes_warning_event_create which attaches relevant events to the notification

The latter action has a funny name, which reflects that it takes a Kubernetes Warning Event as input, finds the related Kubernetes resource (e.g. a Pod), and then fetches all the related Kubernetes Warning Events for that resource.

Actions, Enrichers, and Silencers

Many actions in Robusta were written for a specific purpose, like enriching alerts or silencing them.

By convention, these actions are called enrichers and silencers, but those names are just convention.

Under the hood, enrichers and silencers are plain old actions, nothing more.

Common Actions¶

Popular actions include:

logs_enricher - fetch a Pod's logs
node_bash_enricher - run a bash command on a Node
pod_bash_enricher - run a bash command on a Pod
pod_graph_enricher - attach a graph of Pod memory/CPU/disk usage

All actions can be found under Actions Reference.

Understanding Notifications¶

In Robusta, notifications are called Findings, as they represent something the playbook discovered.

In the above example, a Finding was generated by the create_finding action. Refer to Creating Notifications for more details.

Matching Actions to Triggers¶

Triggers output typed events when they fire. For example:

The on_prometheus_alert trigger outputs a PrometheusAlert event
The on_pod_update trigger outputs a PodChangeEvent event

Each action is compatible with a subset of event types.

For instance, logs_enricher requires an event with a Pod object, such as PrometheusAlert, PodEvent, or PodChangeEvent.

Refer to docs for each action , to see supported events.