List of built-in playbooks¶

Warning

This page contains out-of-date information. It is currently being updated to reflect Robusta’s new configuration format.

Stress Testing and Chaos Engineering¶

generate_high_cpu¶

Playbook

Description

What it does: Causes high CPU usage in the cluster.

When it runs: Manually triggered.

More documentation coming soon

http_stress_test¶

Playbook

Description

What it does: Creates many http requests for a given url

When it runs: When you trigger it manually

Configuration

playbooks:
  - name: "http_stress_test"

Manual Trigger

robusta playbooks trigger http_stress_test url=http://grafana.default.svc:3000 n=1000

Kubernetes Monitoring¶

incluster_ping¶

Playbook

Description

What it does: pings a hostname from within the cluster

When it runs: when you trigger it manually with a command like:

Configuration

playbooks:
  - name: "incluster_ping"

Manual Trigger

robusta playbooks trigger incluster_ping hostname=grafana.default.svc

resource_babysitter¶

Playbook

Description

What it does: send notifications to Slack describing changes to deployments

When it runs: when deployments are created, modified, and deleted.

Configuration

playbooks:
  - name: "deployment_babysitter"
    action_params:
      fields_to_monitor: ["spec.replicas"]

deployment_status_report¶

Playbook

Description

What it does: sends screenshots of grafana panels

When it runs: After a deployment is updated, on configured time intervals

Configuration

playbooks:
  - name: "deployment_status_report"
    trigger_params:
      name_prefix: "MY_MONITORED_DEPLOYMENT"
    action_params:
      report_name: "MY REPORT NAME"
      on_image_change_only: true
      delays:
      - 60       # 60 seconds after a deployment change
      - 600      # 10 minutes after the previous run, i.e. 11 minutes after the deployment change
      - 1200     # 31 minutes after the deployment change
      reports_panel_urls:
      - "http://MY_GRAFANA/d-solo/200ac8fdbfbb74b39aff88118e4d1c2c/kubernetes-compute-resources-node-pods?orgId=1&from=now-1h&to=now&panelId=3"
      - "http://MY_GRAFANA/d-solo/SOME_OTHER_DASHBOARD/.../?orgId=1&from=now-1h&to=now&panelId=3"
      - "http://MY_GRAFANA/d-solo/SOME_OTHER_DASHBOARD/.../?orgId=1&from=now-1h&to=now&panelId=3"

reports_panel_urls it’s highly recommended to put relative time arguments, rather then absolute. i.e. from=now-1h&to=now

on_image_change_only default is true, can be omitted.

Configuring no name_prefix or on_image_change_only: false, may result in too noisy channel

Kubernetes Optimization¶

config_ab_testing¶

Playbook

Description

What it does: Apply YAML configurations to Kubernetes resources for limited periods of time. Adds adds grafana annotations showing when each configuration was applied.

When it runs: every predefined period, defined in the playbook configuration

Example use cases:

Troubleshooting - Finding the first version a production bug appeared by iterating over image tags
Cost/performance optimization - Comparing the cost or performance of different deployment configurations

Configuration

playbooks:
  - name: "config_ab_testing"
    trigger_params:
      seconds_delay: 1200 # 20 min
    action_params:
      grafana_dashboard_uid: "uid_from_url"
      grafana_api_key: "grafana_api_key_with_editor_role"
      grafana_url: "https://mygrafana.mycompany.com"
      kind: "deployment"
      name: "demo-deployment"
      namespace: "robusta"
      configuration_sets:
      - config_set_name: "low cpu high mem"
        config_items:
          "spec.template.spec.containers[0].resources.requests.cpu": 250m
          "spec.template.spec.containers[0].resources.requests.memory": 128Mi
      - config_set_name: "high cpu low mem"
        config_items:
          "spec.template.spec.containers[0].resources.requests.cpu": 750m
          "spec.template.spec.containers[0].resources.requests.memory": 64Mi

Only changing attributes that already exists in the active configuration is supported. For example, you can change resources.requests.cpu, if that attribute already exists in the deployment.

disk_benchmark¶

Playbook

Description

What it does: Automatically create a persistent volume and run a disk performance benchmark with it.

When it runs: When manually triggered

Configuration

playbooks:
  - name: "disk_benchmark"

Manual trigger

robusta playbooks trigger disk_benchmark storage_class_name=fast disk_size=200Gi test_seconds=60

When the benchmark is done, all the resources used for it will be deleted.

storage_class_name should be one of the StorageClasses available on your cluster

Kubernetes Error Handling¶

HPA max replicas¶

Playbook

Description

What it does: Send a slack notification and allow increasing the HPA max replicas limit

When it runs: When an HPA object reaches the max replicas limit

Configuration

playbooks
- name: "alert_on_hpa_reached_limit"
  action_params:
    increase_pct: 20   # Increase factor (%)

Alert Enrichment¶

This is a special playbook that has out-of-the box knowledge about specific Prometheus alerts. See Prometheus Alert Enrichment for details.

List of built-in playbooks¶

Application Visibility and Troubleshooting¶

add_deployment_lines_to_grafana¶

add_alert_lines_to_grafana¶

git_change_audit¶

argo_app_sync¶

restart_loop_reporter¶

python_profiler¶

pod_ps¶

Stress Testing and Chaos Engineering¶

generate_high_cpu¶

http_stress_test¶

Kubernetes Monitoring¶

incluster_ping¶

resource_babysitter¶

deployment_status_report¶

Kubernetes Optimization¶

config_ab_testing¶

disk_benchmark¶

Kubernetes Error Handling¶

HPA max replicas¶

Alert Enrichment¶