Track Failed Kubernetes Jobs¶

Notify about failed Kubernetes Jobs in Slack, MSTeams or other Sinks.

Failing Kubernetes jobs notification on Slack

Avoid Duplicate Alerts

If you installed Robusta with the embedded Prometheus stack, you don't need to configure this playbook. It's configured by default.

Defining a Playbook to Track Failed Jobs¶

Add the following YAML to the customPlaybooks Helm value:

customPlaybooks:
- triggers:
  - on_job_failure: {} # (1)
  actions:
  - create_finding: # (2)
      title: "Job Failed"
      aggregation_key: "JobFailure"
  - job_info_enricher: {} # (3)
  - job_events_enricher: {} # (4)
  - job_pod_enricher: {} # (5)

on_job_failure fires once for each failed Kubernetes Job
create_finding generates a notification message
job_info_enricher fetches the Jobs status and information
job_events_enricher runs kubectl get events, finds Events related to the Job, and attaches them
job_pod_enricher finds Pods that were part of the Job. It attaches Pod-level information like Pod logs

Then do a Helm Upgrade.

Testing Your Playbook¶

Deploy a failing job. The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml

Tips and Tricks¶

Route failed Jobs to specific Slack channels¶

Refer to docs on notification routing.