PrometheusΒΆ

By enabling this toolset, HolmesGPT will be able to generate graphs from prometheus metrics as well as help you write and validate prometheus queries.

There is also an option for Holmes to analyze prometheus metrics. When enabled, HolmesGPT can detect memory leak patterns, CPU throttling, high latency for your APIs, etc. The configuration field to enable prometheus metrics analysis is tool_calls_return_data.

ConfigurationΒΆ

holmes:
    toolsets:
        prometheus/metrics:
            enabled: true
            config:
                prometheus_url: ...
                metrics_labels_time_window_hrs: 48 # default value
                metrics_labels_cache_duration_hrs: 12 # default value
                fetch_labels_with_labels_api: false # default value
                fetch_metadata_with_series_api: false # default value
                tool_calls_return_data: false # default value
                headers:
                    Authorization: "Basic <base_64_encoded_string>"

Update your Helm values (generated_values.yaml) with the above configuration and run a Helm upgrade:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Add the following to ~/.holmes/config.yaml, creating the file if it doesn't exist:

toolsets:
    prometheus/metrics:
        enabled: true
        config:
            prometheus_url: ...
            metrics_labels_time_window_hrs: 48 # default value
            metrics_labels_cache_duration_hrs: 12 # default value
            fetch_labels_with_labels_api: false # default value
            fetch_metadata_with_series_api: false # default value
            tool_calls_return_data: false # default value
            headers:
                Authorization: "Basic <base_64_encoded_string>"

It is also possible to set the PROMETHEUS_URL environment variable instead of the above prometheus_url config key.

Prior to generating a PromQL query, HolmesQPT tends to list the available metrics. This is done to ensure the metrics used in PromQL are actually available.

Below is the full list of options for this toolset:

  • metrics_labels_time_window_hrs Represents the time window, in hours, over which labels are fetched. This avoids fetching obsolete labels. Set it to null to let HolmesGPT fetch labels regardless of when they were generated.

  • metrics_labels_cache_duration_hrs How long are labels cached, in hours. Set it to null to disable caching.

  • fetch_labels_with_labels_api Uses prometheus labels API to fetch labels instead of the series API. In some cases setting to True can improve the performance of the toolset, however there will be an increased number of HTTP calls to prometheus. You can experiment with both as they are functionally identical.

  • fetch_metadata_with_series_api Uses the series API instead of the metadata API. You should only set this value to true if the metadata API is disabled or not working. HolmesGPT's ability to select the right metric will be negatively impacted because the series API does not return key metadata like the metrics/series description or their type (gauge, histogram, etc.).

  • tool_calls_return_data Experimental. If true, the prometheus data will be available to HolmesGPT. In some cases, HolmesGPT will be able to detect memory leaks or other anomalies. This is disabled by default to reduce the likelyhood of reaching the input token limit.

  • headers Extra headers to pass to all prometheus http requests. Use this to pass authentication. Prometheus supports basic authentication.

CapabilitiesΒΆ

The table below describes the specific capabilities provided by this toolset. HolmesGPT can decide to invoke any of these capabilities when answering questions or investigating issues.

Tool Name

Description

list_available_metrics

List all the available metrics to query from prometheus, including their types (counter, gauge, histogram, summary) and available labels.

execute_prometheus_instant_query

Execute an instant PromQL query

execute_prometheus_range_query

Execute a PromQL range query