Track Failed Kubernetes JobsΒΆ

Notify about failed Kubernetes Jobs in Slack, MSTeams, DataDog, or other Sinks.

Failing Kubernetes jobs notification on Slack

Avoid Duplicate Alerts

If you installed Robusta with the embedded Prometheus stack, you don't need to configure this playbook. It's configured by default.

Defining a PlaybookΒΆ

Add the following YAML to the customPlaybooks Helm value:

customPlaybooks:
- triggers:
  - on_job_failure: {} # (1)
  actions:
  - create_finding: # (2)
      title: "Job Failed"
      aggregation_key: "JobFailure"
  - job_info_enricher: {} # (3)
  - job_events_enricher: {} # (4)
  - job_pod_enricher: {} # (5)
  1. on_job_failure fires once for each failed Kubernetes Job

  2. create_finding generates a notification message

  3. job_info_enricher fetches the Jobs status and information

  4. job_events_enricher runs kubectl get events, finds Events related to the Job, and attaches them

  5. job_pod_enricher finds Pods that were part of the Job. It attaches Pod-level information like Pod logs

Then do a Helm Upgrade.

Testing Your PlaybookΒΆ

Deploy a failing job. The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml

Tips and TricksΒΆ

Route failed Jobs to specific Slack channelsΒΆ

Refer to docs on notification routing.