Welcome to Robusta

Robusta is an open source platform for Kubernetes troubleshooting. It sits on top of your monitoring stack (Prometheus, Elasticsearch, etc.) and tells you why alerts occurred and how to fix them.

Robusta has three main parts, all open source:

  1. An automations engine for Kubernetes

  2. Builtin automations to enrich and fix common alerts

  3. Manual troubleshooting tools for everything else

There are additional optional components:

  1. An all-in-one bundle with Robusta, the Prometheus Operator, and default Kubernetes alerts 1

  2. A web UI to see all alerts, changes, and events in your cluster. 2

Example Use Cases

Monitor crashing pods and send their logs to Slack

Show application updates in Grafana to correlate them with error spikes

Temporarily increase the HPA maximum so you can go back to sleep

Attach the VSCode debugger to a running Python pod without tearing your hair out

robusta playbooks trigger python_debugger name=podname namespace=default

See Python debugger for more details

How it works

Robusta automates everything that happens after you deploy your application.

It is somewhat like Zapier/IFTTT for devops, with an emphasis on prebuilt automations and not just "build your own".

For example, the following automation sends logs to Slack when an alert fires for crashing pods:

triggers:
  - on_prometheus_alert:
      alert_name: KubePodCrashLooping
actions:
  - logs_enricher: {}
sinks:
  - slack

Every automation has three parts:

Triggers

When to run (on alerts, logs, changes, etc)

Actions

What to do (over 50 builtin actions)

Sinks

Where to send the result (Slack, etc)

Automations run via webhook so if they fail it wont bring down your environment.

Writing your own automations

Many automations are included, but you can also write your own in Python.

View example action (Python)
# this runs on Prometheus alerts you specify in the YAML
@action
def my_enricher(event: PrometheusKubernetesAlert):
    # we have full access to the pod on which the alert fired
    pod = event.get_pod()
    pod_name = pod.metadata.name
    pod_logs = pod.get_logs()
    pod_processes = pod.exec("ps aux")

    # this is how you send data to slack or other destinations
    event.add_enrichment([
        MarkdownBlock("*Oh no!* An alert occurred on " + pod_name),
        FileBlock("crashing-pod.log", pod_logs)
    ])

Next Steps

Ready to install Robusta? Get started!

Star us on GitHub to receive updates.

Footnotes

1

These alerts should cause no noise on a healthy cluster. If they're noisy in your env, let us know and we'll fix it.

2

This is the only component that isn't open source and it's completely optional. An on-prem version is in development too.