AI Analysis¶

Why use HolmesGPT?¶

Robusta can integrate with Holmes GPT to analyze health issues on your cluster, and to run AI based root cause analysis for alerts.

This requires a Robusta SaaS account, and for the Robusta UI sink to be enabled. (We have plans to support HolmesGPT in a pure OSS mode in the near future. Stay tuned!)

When available, AI based investigations can be launched in one of two ways:

Click the Ask HolmesGPT button in Slack. The AI investigation will be sent back as a new message.

In the Robusta UI, click the Root Cause tab on an alert.

Configuring HolmesGPT¶

Add enableHolmesGPT: true to the Robusta Helm values, and then follow these steps:

Choose an AI model - we highly recommend using GPT-4o to get the most accurate results! Other models may work, but are not officially supported.
Configure your AI provider with the chosen model.
Configure HolmesGPT Access to SaaS Data.

Choosing and configuring an AI provider¶

Choose an AI provider below and follow the instructions:

Robusta AI

Robusta AI is a premium AI service hosted by Robusta. To use Robusta AI, you must:

Have a Robusta account and enable the Robusta UI sink in Robusta's Helm values.
Add the following to your Helm values (generated_values.yaml file) and run a Helm Upgrade

enableHolmesGPT: true
holmes:
  additionalEnvVars:
  - name: ROBUSTA_AI
    value: "true"

If you store the Robusta UI token in a Kubernetes secret, follow the instructions in Configuring HolmesGPT Access to SaaS Data.

OpenAI

Create a secret with your OpenAI API key:

kubectl create secret generic holmes-secrets --from-literal=openAiKey='<API_KEY_GOES_HERE>'

Then add the following to your helm values (generated_values.yaml file):

enableHolmesGPT: true
holmes:
  additionalEnvVars:
  - name: MODEL
    value: gpt-4o
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: holmes-secrets
        key: openAiKey

Run a Helm Upgrade to apply the configuration.

Azure AI

Go into your Azure portal, change the default rate-limit to the maximum, and find the following parameters:

API_VERSION
DEPLOYMENT_NAME
ENDPOINT
API_KEY

Step-By-Step Instruction for Azure Portal

The following steps cover how to obtain the correct AZURE_API_VERSION value and how to increase the token limit to prevent rate limiting.

Go to your Azure portal and choose Azure OpenAI

Click your AI service

Click Go to Azure Open AI Studio

Choose Deployments

Select your Deployment - note the DEPLOYMENT_NAME! Include 'gpt-4o' in the deployment name if you are using that model.

Click Open in Playground

Go to View Code

Choose Python and scroll to find the ENDPOINT, API_KEY, and API_VERSION. Copy them! You will need them for Robusta's Helm values.

Go back to Deployments, and click Edit Deployment

MANDATORY: Increase the token limit. Change this value to at least 450K tokens for Holmes to work properly. We recommend choosing the highest value available. (Holmes queries Azure AI infrequently but in bursts. Therefore the overall cost of using Holmes with Azure AI is very low, but you must increase the quota to avoid getting rate-limited on a single burst of requests.)

Create a secret with the Azure API key you found above:

kubectl create secret generic holmes-secrets --from-literal=azureOpenAiKey='<AZURE_API_KEY_GOES_HERE>'

Update your helm values (generated_values.yaml file) with the following configuration:

enableHolmesGPT: true
holmes:
  additionalEnvVars:
  - name: MODEL
    value: azure/<DEPLOYMENT_NAME>  # replace with deployment name from the portal (e.g. avi-deployment), leave "azure/" prefix
  - name: MODEL_TYPE
    value: gpt-4o                   # your azure deployment model type
  - name: AZURE_API_VERSION
    value: <API_VERSION>            # replace with API version you found in the Azure portal
  - name: AZURE_API_BASE
    value: <AZURE_ENDPOINT>         # fill in the base endpoint url of your azure deployment - e.g. https://my-org.openai.azure.com/
  - name: AZURE_API_KEY
    valueFrom:
      secretKeyRef:
        name: holmes-secrets
        key: azureOpenAiKey

Run a Helm Upgrade to apply the configuration.

AWS Bedrock

You will need the following AWS parameters:

BEDROCK_MODEL_NAME
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Create a secret with your AWS credentials:

kubectl create secret generic holmes-secrets --from-literal=awsAccessKeyId='<YOUR_AWS_ACCESS_KEY_ID>' --from-literal=awsSecretAccessKey'<YOUR_AWS_SECRET_ACCESS_KEY>'

Update your helm values (generated_values.yaml file) with the following configuration:

enableHolmesGPT: true
holmes:
  enablePostProcessing: true
  additionalEnvVars:
  - name: MODEL
    value: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0  # your bedrock model - replace with your own exact model name
  - name: AWS_REGION_NAME
    value: us-east-1
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: holmes-secrets
        key: awsAccessKeyId
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: holmes-secrets
        key: awsSecretAccessKey

Run a Helm Upgrade to apply the configuration.

Multiple providers

Starting from version 0.22.1, Robusta supports an alternative way to configure AI models: using a YAML dictionary in your Helm values file.

This method allows you to configure multiple models at once, each with its own parameters.

Update your Helm values (generated_values.yaml file) with the following configuration.

When multiple models are defined, the Robusta UI will allow users to choose a specific model when initiating an AI-based investigation.

Model info

When using multiple providers, the keys differ slightly from the single-provider case.

enableHolmesGPT: true

holmes:
  modelList: # sample configuration.
    openai:
      model: openai/gpt-4o
      api_key: "{{ env.API_KEY }}"
    azure-low-budget:
      model : azure/team-low-budget
      api_base : <your-api-base> # fill in the base endpoint url of your azure deployment - e.g. https://my-org.openai.azure.com/
      api_version : "2024-06-01"
      api_key : "{{ env.AZURE_API_KEY }}" # you can load the values from an environment variable as well.
      temperature: 0
    bedrock-devops:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 # your bedrock model.
      aws_region_name: us-east-1
      aws_access_key_id: "{{ env.AWS_ACCESS_KEY_ID }}" # you can load the values from an environment variable as well.
      aws_secret_access_key: <your-aws-secret-access-key>
      thinking: {"type": "enabled", "budget_tokens": 1024}

Run a Helm Upgrade to apply the configuration.

Configuring HolmesGPT Access to SaaS Data¶

To use HolmesGPT with the Robusta UI, one further step may be necessary, depending on how Robusta is configured.

If you define the Robusta UI token directly in your Helm values, HolmesGPT can read the token automatically and no further setup is necessary.
If you store the Robusta UI token in a Kubernetes secret, follow the instructions below.

Note: the same Robusta UI token is used for the Robusta UI sink and for HolmesGPT.

Reading the Robusta UI Token from a secret in HolmesGPT¶

Review your existing Robusta Helm values - you should have an existing section similar to this, which reads the Robusta UI token from a secret:

runner:
  additional_env_vars:
  - name: UI_SINK_TOKEN
    valueFrom:
      secretKeyRef:
        name: my-robusta-secrets
        key: ui-token

sinksConfig:
- robusta_sink:
    name: robusta_ui_sink
    token: "{{ env.UI_SINK_TOKEN }}"

Add the following to your Helm values, directing HolmesGPT to use the same secret, passed as an environment variable named ROBUSTA_UI_TOKEN:

holmes:
  additionalEnvVars:
  ....
  - name: ROBUSTA_UI_TOKEN
    valueFrom:
      secretKeyRef:
        name: my-robusta-secrets
        key: ui-token

Run a Helm Upgrade to apply the configuration.

Enable Holmes in Slack in the Platform¶

Go to https://platform.robusta.dev/
Navigate to: Settings → AI Assistant

Enable Holmes using the toggle.
Click Connect Slack Workspace to authorize Holmes in your Slack workspace.
Use Holmes in Slack

In any Slack channel or thread, tag Holmes using @holmes like:
```
@holmes can you look into this
```
Or ask natural language questions about a specific cluster. Examples:

Test Holmes Integration¶

In this section we will see Holmes in action by deploying a crashing pod and analyzing the alert with AI.

Before we proceed, you must follow the instructions above and configure Holmes.

Once everything is setup:

Deploy a crashing pod to simulate an issue.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml

Go to the Timeline in platform.robusta.dev and click on the CrashLoopBackOff alert

Click the "Root Cause" tab on the top. This gives you the result of an investigation done by HolmesGPT based on the alert.

Additionally your alerts on Slack will have an "Ask Holmes" button that sends an analysis back to Slack.

Warning

Due to technical limitations with Slack, alerts analyzed from Slack will be sent to the AI without alert-labels.

This means sometimes the AI won't know the namespace, pod name, or other metadata and the results may be less accurate.

For the most accurate results, it is best to use the Robusta UI.

Adding data sources to HolmesGPT¶

HolmesGPT's toolsets are fundamental to its ability to investigate and diagnose Kubernetes cluster issues effectively. Each toolset provides specialized capabilities for gathering and analyzing specific aspects of cluster health, performance, and configuration.

The more toolsets available to HolmesGPT, the more comprehensive and nuanced its investigation process becomes, enabling it to identify complex issues through multiple perspectives and provide more accurate, actionable recommendations for resolution.

Builtin toolsets¶

Follow this link to learn how to configure builtin toolsets.

Built-in toolsets cover essential areas like pod status inspection, node health analysis, application diagnostics, and resource utilization monitoring. These toolsets include access to Kubernetes events and logs, AWS, Grafana, OpenSearch, etc. See the full list here.

Custom toolsets¶

By adding custom toolsets, users can extend HolmesGPT's investigation capabilities to address unique use cases, specific infrastructure setups, or organization-specific requirements.

For example, custom toolsets might include specialized log analysis patterns or integration with external monitoring systems.

Custom toolsets are created through your Helm values file and you can find instructions to write your own toolsets here.

Remote MCP servers¶

Warning

Remote MCP servers are in Tech Preview stage.

Remote MCP server connections are configured through your Helm values file. For detailed instructions, refer to the Connecting to Remote MCP Servers guide.