Playbook BasicsΒΆ
A playbook is an automation rule for detecting, investigating, or fixing problems in your cluster.
For a gentle introduction, see What are Playbooks?
OverviewΒΆ
Every playbook consists of a condition (trigger) and instructions (actions) defining the response.
Playbooks behave like pipelines:
Events come into Robusta and are checked against triggers.
When there is a match, a trigger fires
The relevant playbook runs
All playbook actions execute, receiving the event as context
If notifications were generated by the playbook, they are sent to sinks.
Defining Custom PlaybooksΒΆ
Using a custom playbook, we can get notified in Slack whenever a Pod's Liveness probe fails.
Use the customPlaybooks
Helm value:
customPlaybooks:
- triggers:
- on_kubernetes_warning_event_create:
include: ["Liveness"] # fires on failed Liveness probes
actions:
- create_finding:
severity: HIGH
title: "Failed liveness probe: $name"
- event_resource_events: {}
Perform a Helm Upgrade to apply the custom playbook.
Next time a Liveness probe fails, you will get notified.
Apply the following command the simulate a failing liveness probe.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml
Let's explore each part of the above playbook in depth.
Modifying Default PlaybooksΒΆ
By default, Robusta has a default set of playbooks
configured. These are used to create notifications for all common Kubernetes issues and Prometheus alerts.
You can disable any of the default playbooks
, or change the configuration of a given playbook
.
In order to disable a default playbook, add the playbook name to the disabledPlayooks
helm value (Playbook name is in the name
attribute of each playbook)
For example, to disable the ImagePullBackOff
playbook, use:
disabledPlaybooks:
- ImagePullBackOff
In order to override the default configuration of the same playbook, both disable it, and add it to customPlaybooks
with the override configuration:
disabledPlaybooks:
- ImagePullBackOff
customPlaybooks:
- name: "CustomImagePullBackOff"
triggers:
- on_image_pull_backoff:
fire_delay: 300 # fire only if failing to pull the image for 5 min
actions:
- image_pull_backoff_reporter: {}
Organizing PlaybooksΒΆ
Using namedCustomPlaybooks
, you can define playbooks by name. This is useful when you want to define a base set of playbooks for all clusters/teams and then use additional Helm values files to override some of the base playbooks or add new ones.
They are all merged together into a single playbooks list. This allows you to split away the custom playbooks from generated_values.yaml
to separate files and organize your playbooks.
First, add the custom playbooks as a dictionary into a file named app_a_playbooks.yaml
as shown below:
namedCustomPlaybooks:
team-a-app-a:
- triggers:
- on_prometheus_alert:
namespace_prefix: "app-a"
actions:
- create_finding:
aggregation_key: "This is app-a - Requires your attention"
severity: HIGH
title: "Check app-a out"
description: "@monitoring.monitoring this is for you"
team-b-app-b:
- triggers:
- on_prometheus_alert:
namespace_prefix: "app-b"
actions:
# Actions for team-b-app-b here
Then run a Helm upgrade by passing the new file using the -f
flag.
helm ugprade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml
Understanding TriggersΒΆ
Triggers are event-driven, firing at specific moments when something occurs in your cluster. Even a Kubernetes cluster doing nothing generates a constant stream of events. Using triggers, you can find and react to the events that matter.
Going back to the above example, we saw the trigger on_kubernetes_warning_event_create
.
Breaking down the name, you'll notice the format on_<resource_type>_<operation>
. This is a general pattern.
on_kubernetes_warning_event_create
fires when new Warning Events (kubectl get events --all-namespaces --field-selector type=Warning
) are created.
The trigger also had an include filter, limiting which Warning Events cause the playbook to run. In this case its a Liveness probe event. See each trigger's documentation to learn which filters are supported.
Common TriggersΒΆ
Popular triggers include:
All triggers can be found under Triggers Reference.
Understanding ActionsΒΆ
Actions perform tasks in response to triggers, such as collecting information, investigating issues, or fixing problems.
In the above example, there were two actions. When playbooks contain multiple actions, they are executed in order:
create_finding
- this generates the notification messageevent_resource_events
- this is a specific action foron_kubernetes_warning_event_create
which attaches relevant events to the notification
The latter action has a funny name, which reflects that it takes a Kubernetes Warning Event as input, finds the related Kubernetes resource (e.g. a Pod), and then fetches all the related Kubernetes Warning Events for that resource.
Actions, Enrichers, and Silencers
Many actions in Robusta were written for a specific purpose, like enriching alerts or silencing them.
By convention, these actions are called enrichers and silencers, but those names are just convention.
Under the hood, enrichers and silencers are plain old actions, nothing more.
Common ActionsΒΆ
Popular actions include:
logs_enricher - fetch a Pod's logs
node_bash_enricher - run a bash command on a Node
pod_bash_enricher - run a bash command on a Pod
pod_graph_enricher - attach a graph of Pod memory/CPU/disk usage
All actions can be found under Actions Reference.
Understanding NotificationsΒΆ
In Robusta, notifications are called Findings, as they represent something the playbook discovered.
In the above example, a Finding was generated by the create_finding
action. Refer to Creating Notifications
for more details.
Matching Actions to TriggersΒΆ
Triggers output typed events when they fire. For example:
The
on_prometheus_alert
trigger outputs a PrometheusAlert eventThe
on_pod_update
trigger outputs a PodChangeEvent event
Each action is compatible with a subset of event types.
For instance, logs_enricher
requires an event with a Pod object, such as PrometheusAlert, PodEvent, or PodChangeEvent.
Refer to docs for each action , to see supported events.