Track Failed Kubernetes JobsΒΆ
Notify about failed Kubernetes Jobs in Slack, MSTeams, DataDog, or other Sinks.
Avoid Duplicate Alerts
If you installed Robusta with the embedded Prometheus stack, you don't need to configure this playbook. It's configured by default.
Defining a PlaybookΒΆ
Add the following YAML to the customPlaybooks
Helm value:
customPlaybooks:
- triggers:
- on_job_failure: {} # (1)
actions:
- create_finding: # (2)
title: "Job Failed"
aggregation_key: "JobFailure"
- job_info_enricher: {} # (3)
- job_events_enricher: {} # (4)
- job_pod_enricher: {} # (5)
on_job_failure fires once for each failed Kubernetes Job
create_finding generates a notification message
job_info_enricher fetches the Jobs status and information
job_events_enricher runs
kubectl get events
, finds Events related to the Job, and attaches themjob_pod_enricher finds Pods that were part of the Job. It attaches Pod-level information like Pod logs
Then do a Helm Upgrade.
Testing Your PlaybookΒΆ
Deploy a failing job. The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml
Tips and TricksΒΆ
Route failed Jobs to specific Slack channelsΒΆ
Refer to docs on notification routing.