RemediationΒΆ

Robusta includes actions that modify Kubernetes resources in your cluster. See also:

Alert handling jobΒΆ

Playbook Action: alert_handling_job

Create a kubernetes job with the specified parameters

In addition, the job pod receives the following alert parameters as environment variables

ALERT_NAME

ALERT_STATUS

ALERT_OBJ_KIND - oneof pod/deployment/node/job/daemonset or None in case it's unknown

ALERT_OBJ_NAME

ALERT_OBJ_NAMESPACE (If present)

ALERT_OBJ_NODE (If present)

ALERT_LABEL_{LABEL_NAME} for every label on the alert. For example a label named foo becomes ALERT_LABEL_FOO

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - alert_handling_job:
      command:
      - perl
      - -Mbignum=bpi
      - -wle
      - print bpi(2000)
      image: string
  triggers:
  - on_prometheus_alert: {}

The above is an example. Try customizing the trigger and parameters.

required:
image (str)

The job image.

command (str list)

The job command as array of strings

optional:
name (str) = robusta-action-job

Custom name for the job and job container.

namespace (str) = default

The created job namespace.

service_account (str)

Job pod service account. If omitted, default is used.

restart_policy (str) = OnFailure

Job container restart policy

job_ttl_after_finished (int) = 120

Delete finished job ttl (seconds). If omitted, jobs will not be deleted automatically.

notify (bool)

Add a notification for creating the job.

wait_for_completion (bool) = True

Wait for the job to complete and attach it's output. Only relevant when notify=true.

completion_timeout (int) = 300

Maximum seconds to wait for job to complete. Only relevant when wait_for_completion=true.

backoff_limit (int)

Specifies the number of retries before marking this job failed. Defaults to 6

active_deadline_seconds (int)

Specifies the duration in seconds relative to the startTime

that the job may be active before the system tries to terminate it; value must be

positive integer

env (envvar list)

Inject environment variables and secrets just like you do with a Kubernetes Job.

Delete podΒΆ

Playbook Action: delete_pod

Deletes a pod

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - delete_pod: {}
  triggers:
  - on_pod_oom_killed: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger delete_pod name=POD_NAME namespace=POD_NAMESPACE 

Delete jobΒΆ

Playbook Action: delete_job

Delete the job from the cluster

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - delete_job: {}
  triggers:
  - on_job_failure: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger delete_job name=JOB_NAME namespace=JOB_NAMESPACE 

Alert on hpa reached limitΒΆ

Playbook Action: alert_on_hpa_reached_limit

Notify when the HPA reaches its maximum replicas and allow fixing it.

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - alert_on_hpa_reached_limit: {}
  triggers:
  - on_horizontalpodautoscaler_update: {}

The above is an example. Try customizing the trigger and parameters.

optional:
increase_pct (int) = 20

Increase the HPA max_replicas by this percentage.

Rollout restartΒΆ

Playbook Action: rollout_restart

Performs rollout restart on a kubernetes workload. Supports deployments, deploymentconfig, daemonsets and statefulsets related events.

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - rollout_restart: {}
  triggers:
  - on_prometheus_alert: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger rollout_restart kind=RESOURCE_KIND name=RESOURCE_NAME 

NodeΒΆ

CordonΒΆ

Playbook Action: cordon

Cordon, Taints a node as unschedulable.

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - cordon: {}
  triggers:
  - on_node_create: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger cordon name=NODE_NAME 

UncordonΒΆ

Playbook Action: uncordon

Unordon, Taints a node as schedulable.

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - uncordon: {}
  triggers:
  - on_node_create: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger uncordon name=NODE_NAME 

DrainΒΆ

Playbook Action: drain

Drain, taints a node as unschedulable, and evicts all pods from the node. DaemonSets pods are skipped, as they tolerant unschedulable nodes by default.

Add this to your Robusta configuration (Helm values.yaml):

customPlaybooks:
- actions:
  - drain: {}
  triggers:
  - on_node_create: {}

The above is an example. Try customizing the trigger and parameters.

No action parameters

This action can be manually triggered using the Robusta CLI:

robusta playbooks trigger drain name=NODE_NAME