List of built-in playbooks

Application Visibility and Troubleshooting

Restart loop reporter

Playbook Action

When a pod is in restart loop, debug the issue, fetch the logs, and send useful information on the restart

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- restart_loop_reporter: {}
triggers:
- on_pod_all_changes: {}

The above is an example. Try customizing the trigger and parameters.

optional:
rate_limit (int) = 3600

Rate limit the execution of this action (Seconds).

restart_reason (str)

Limit restart loops for this specific reason. If omitted, all restart reasons will be included.

  • on_pod_create

  • on_pod_all_changes

  • on_pod_delete

  • on_prometheus_alert

  • on_pod_update

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger restart_loop_reporter name=POD_NAME namespace=POD_NAMESPACE 

Pod ps

Playbook Action

Fetch the list of running processes in a pod.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- pod_ps: {}
triggers:
- on_pod_all_changes: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

  • on_pod_create

  • on_pod_all_changes

  • on_pod_delete

  • on_prometheus_alert

  • on_pod_update

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger pod_ps name=POD_NAME namespace=POD_NAMESPACE 

Kubernetes Error Handling

Node health watcher

Playbook Action

Notify when a node becomes unhealthy.

Add useful information regarding the node's health status.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- node_health_watcher: {}
triggers:
- on_node_all_changes: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

  • on_node_delete

  • on_node_all_changes

  • on_node_update

  • on_node_create

Alert on hpa reached limit

Playbook Action

Notify when the HPA reaches it's maximum replicas and allow fixing it.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- alert_on_hpa_reached_limit: {}
triggers:
- on_horizontalpodautoscaler_delete: {}

The above is an example. Try customizing the trigger and parameters.

optional:
increase_pct (int) = 20

Increase the HPA max_replicas by this percentage.

  • on_horizontalpodautoscaler_delete

  • on_horizontalpodautoscaler_all_changes

  • on_horizontalpodautoscaler_create

  • on_horizontalpodautoscaler_update

Scale hpa callback

Playbook Action

Update the max_replicas of this HPA to the specified value.

Usually used as a callback action, when the HPA reaches the max_replicas limit.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- scale_hpa_callback:
    max_replicas: 1
triggers:
- on_horizontalpodautoscaler_delete: {}

The above is an example. Try customizing the trigger and parameters.

required:
max_replicas (int)

New max_replicas to set this HPA to.

  • on_horizontalpodautoscaler_delete

  • on_horizontalpodautoscaler_all_changes

  • on_horizontalpodautoscaler_create

  • on_horizontalpodautoscaler_update

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger scale_hpa_callback name=HORIZONTALPODAUTOSCALER_NAME namespace=HORIZONTALPODAUTOSCALER_NAMESPACE  max_replicas=MAX_REPLICAS

Kubernetes Events

Event report

Playbook Action

Create finding based on the kubernetes event

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- event_report: {}
triggers:
- on_kubernetes_warning_event: {}

The above is an example. Try customizing the trigger and parameters.

optional:
rate_limit (int) = 3600

Rate limit the execution of this action (Seconds).

finding_key (str) = DEFAULT

Specify the finding identifier, to reference it in other actions.

  • on_kubernetes_warning_event

  • on_event_all_changes

  • on_event_create

  • on_event_delete

  • on_event_update

Event resource events

Playbook Action

Enrich the finding with the kubernetes events of the involved resource specified in the event

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- event_resource_events: {}
triggers:
- on_kubernetes_warning_event: {}

The above is an example. Try customizing the trigger and parameters.

optional:
finding_key (str) = DEFAULT

Specify the finding identifier, to reference it in other actions.

  • on_kubernetes_warning_event

  • on_event_all_changes

  • on_event_create

  • on_event_delete

  • on_event_update

Kubernetes Monitoring

Git change audit

Playbook Action

Audit Kubernetes resources from the cluster to Git as yaml files (cluster/namespace/resources hierarchy). Monitor resource changes and save it to a dedicated Git repository.

Using this audit repository, you can easily detect unplanned changes on your clusters.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- git_change_audit:
    cluster_name: string
    git_key: '********'
    git_url: git@github.com:arikalon1/robusta-audit.git
triggers:
- on_service_delete: {}

The above is an example. Try customizing the trigger and parameters.

required:
cluster_name (str)

This cluster name. Changes will be audited under this cluster name.

git_url (str)

Audit Git repository url.

git_key (str)

Git repository deployment key with write access. To set this up generate a private/public key pair for GitHub.

optional:
ignored_changes (str list)

List of changes that shouldn't be audited.

  • on_kubernetes_any_resource_all_changes

  • on_kubernetes_any_resource_create

  • on_kubernetes_any_resource_update

  • on_kubernetes_any_resource_delete

Or any other inheriting trigger. See Triggers for details
  • on_pod_all_changes

  • on_job_delete

  • on_statefulset_all_changes

  • ...

Deployment status report

Playbook Action

Collect predefined grafana panels screenshots, after a deployment change. The report will be generated in intervals, as configured in the 'delays' parameter. When the report is ready, it will be sent to the configured sinks.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- deployment_status_report:
    delays:
    - 1
    - 1
    grafana_api_key: '********'
    reports_panel_urls:
    - http://MY_GRAFANA/d-solo/SOME_OTHER_DASHBOARD/.../?orgId=1&from=now-1h&to=now&panelId=3
triggers:
- on_deployment_delete: {}

The above is an example. Try customizing the trigger and parameters.

required:
grafana_api_key (str)

Grafana API key.

delays (int list)

List of seconds intervals in which to generate this report. Specifying [60, 60] will generate this report twice, after 60 seconds and 120 seconds after the change.

reports_panel_urls (str list)

List of panel urls included in this report. it's highly recommended to put relative time arguments, rather then absolute. i.e. from=now-1h&to=now

optional:
report_name (str) = Deployment change report

The name of the report.

fields_to_monitor (str list) = ['image']

List of yaml attributes to monitor. Any field that contains one of these strings will match.

  • on_deployment_all_changes

  • on_deployment_delete

  • on_deployment_update

  • on_deployment_create

Resource babysitter

Playbook Action

Track changes to a k8s resource. Send the diff as a finding

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- resource_babysitter: {}
triggers:
- on_service_delete: {}

The above is an example. Try customizing the trigger and parameters.

optional:
fields_to_monitor (str list) = ['spec']

List of yaml attributes to monitor. Any field that contains one of these strings will match.

omitted_fields (str list) = ['status', 'metadata.generation', 'metadata.resourceVersion', 'metadata.managedFields', 'spec.replicas']

List of yaml attributes changes to ignore.

  • on_kubernetes_any_resource_all_changes

  • on_kubernetes_any_resource_create

  • on_kubernetes_any_resource_update

  • on_kubernetes_any_resource_delete

Or any other inheriting trigger. See Triggers for details
  • on_pod_all_changes

  • on_job_delete

  • on_statefulset_all_changes

  • ...

Incluster ping

Playbook Action

Check network connectivity in your cluster using ping. Pings a hostname from within the cluster

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- incluster_ping:
    hostname: string
triggers:
- on_pod_create: {}

The above is an example. Try customizing the trigger and parameters.

required:
hostname (str)

Ping target host name.

  • Any trigger

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger incluster_ping  hostname=HOSTNAME

Integrations

Argo app sync

Playbook Action

Sync a specified Argo CD application. Send a finding notifying the sync was performed

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- argo_app_sync:
    argo_app_name: string
    argo_token: '********'
    argo_url: https://my-argo-cd.com
triggers:
- on_pod_create: {}

The above is an example. Try customizing the trigger and parameters.

required:
argo_url (str)

http(s) Argo CD server url.

argo_token (str)

Argo CD authentication token.

argo_app_name (str)

Argo CD application that needs syncing.

optional:
argo_verify_server_cert (bool) = True

verify Argo CD server certificate. Defaults to True.

rate_limit_seconds (int) = 1800

this playbook is rate limited. Defaults to 1800 seconds.

  • Any trigger

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger argo_app_sync  argo_url=ARGO_URL argo_token=ARGO_TOKEN argo_app_name=ARGO_APP_NAME

Kubernetes Optimization

Config ab testing

Playbook Action

Apply YAML configurations to Kubernetes resources for limited periods of time.

Adds adds grafana annotations showing when each configuration was applied.

The execution schedule is defined by the playbook trigger. (every X seconds)

Commonly used for:

Troubleshooting - Finding the first version a production bug appeared by iterating over image tags Cost/performance optimization - Comparing the cost or performance of different deployment configurations

Note:

Only changing attributes that already exists in the active configuration is supported.

For example, you can change resources.requests.cpu, if that attribute already exists in the deployment.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- config_ab_testing:
    configuration_sets:
    - config_items: '"spec.template.spec.containers[0].resources.requests.cpu": 250m,

        "spec.template.spec.containers[0].resources.requests.memory": 128Mi'
      config_set_name: string
    - config_items: '"spec.template.spec.containers[0].resources.requests.cpu": 250m,

        "spec.template.spec.containers[0].resources.requests.memory": 128Mi'
      config_set_name: string
    grafana_api_key: '********'
    grafana_dashboard_uid: 09ec8aa1e996d6ffcd6817bbaff4db1b
    grafana_url: http://grafana.namespace.svc
    kind: string
    name: string
triggers:
- on_schedule: {}

The above is an example. Try customizing the trigger and parameters.

required:
grafana_api_key (str)

grafana key with write permissions.

grafana_dashboard_uid (str)

dashboard ID as it appears in the dashboard's url

kind (str)

The kind of the tested resource. Kind can be 'Deployment'/'StatefulSet' etc

name (str)

The name of the tested resource.

configuration_sets (complex list)

List of test configurations.

each entry contains:

required:
config_set_name (str)

The name of this configuration set. .

optional:
config_items (str dict)

The yaml attributes values for this configuration set.

optional:
grafana_url (str)

http(s) url of grafana or None for autodetection of an in-cluster grafana

api_version (str) = v1

The api version of the tested resource.

namespace (str) = default

The namespace of the tested resource.

  • on_schedule

Disk benchmark

Playbook Action

Run disk benchmark in your cluster. The benchmark creates a PVC, using the configured storage class, and runs the benchmark using fio. For more details: https://fio.readthedocs.io/en/latest/

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- disk_benchmark:
    storage_class_name: string
triggers:
- on_pod_create: {}

The above is an example. Try customizing the trigger and parameters.

required:
storage_class_name (str)

Pvc storage class, From the available cluster storage classes. standard/fast/etc.

optional:
pvc_name (str) = robusta-disk-benchmark

Name of the pvc created for the benchmark.

test_seconds (int) = 20

The benchmark duration.

namespace (str) = robusta

Namespace used for the benchmark.

disk_size (str) = 10Gi

The size of pvc used for the benchmark.

  • Any trigger

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger disk_benchmark  storage_class_name=STORAGE_CLASS_NAME

Stress Testing and Chaos Engineering

Generate high cpu

Playbook Action

Create a pod with high CPU on the cluster for 60 seconds. Can be used to simulate alerts or other high CPU load scenarios.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- generate_high_cpu: {}
triggers:
- on_pod_create: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

  • Any trigger

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger generate_high_cpu 

Http stress test

Playbook Action

Run an http stress test and send the results

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- http_stress_test:
    url: string
triggers:
- on_pod_create: {}

The above is an example. Try customizing the trigger and parameters.

required:
url (str)

In cluster target url.

optional:
n (int) = 1000

Number of requests to run.

  • Any trigger

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger http_stress_test  url=URL

Prometheus alert

Playbook Action

Simulate Prometheus alert sent to the Robusta runner. Can be used for testing, when implementing actions triggered by Prometheus alerts.

This action can be run automatically.

Add this to your Robusta configuration (values.yaml when installing with Helm):

actions:
- prometheus_alert:
    alert_name: string
    pod_name: string
triggers:
- on_pod_create: {}

The above is an example. Try customizing the trigger and parameters.

required:
alert_name (str)

Simulated alert name.

pod_name (str)

Pod name, for a simulated pod alert.

optional:
namespace (str) = default

Pod namespace, for a simulated pod alert.

status (str) = firing

Simulated alert status. firing/resolved.

severity (str) = error

Simulated alert severity.

description (str) = simulated prometheus alert

Simulated alert description.

generator_url (str)

Prometheus generator_url. Some enrichers, use this attribute to query Prometheus.

  • Any trigger

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger prometheus_alert  alert_name=ALERT_NAME pod_name=POD_NAME