Datadog logsΒΆ

By enabling this toolset, HolmesGPT will fetch pod logs from Datadog.

You should enable this toolset to replace the default kubernetes/logs toolset if all your kubernetes pod logs are consolidated inside Datadog. It will make it easier for HolmesGPT to fetch incident logs, including the ability to precisely consult past logs.

HolmesGPT provides several out-of-the-box alternatives for log access. You can select from these options:

ConfigurationΒΆ

holmes:
  toolsets:
    datadog/logs:
      enabled: true
      config:
        dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key
        dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key
        site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3)
        indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"]
        storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"]
        labels: # Optional. Map Datadog labels to Kubernetes resources
          pod: "pod_name"
          namespace: "kube_namespace"
        page_size: 300 # Optional. Number of logs per API page. Default: 300
        default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000
        request_timeout: 60 # Optional. API request timeout in seconds. Default: 60

    kubernetes/logs:
      enabled: false # HolmesGPT's default logging mechanism MUST be disabled

Update your Helm values (generated_values.yaml) with the above configuration and run a Helm upgrade:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Add the following to ~/.holmes/config.yaml, creating the file if it doesn't exist:

toolsets:
  datadog/logs:
    enabled: true
    config:
      dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key
      dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key
      site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3)
      indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"]
      storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"]
      labels: # Optional. Map Datadog labels to Kubernetes resources
        pod: "pod_name"
        namespace: "kube_namespace"
      page_size: 300 # Optional. Number of logs per API page. Default: 300
      default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000
      request_timeout: 60 # Optional. API request timeout in seconds. Default: 60

  kubernetes/logs:
    enabled: false # HolmesGPT's default logging mechanism MUST be disabled

DataDog Kubernetes cluster labelΒΆ

In multi cluster environments, HolmesGPT will route alerts investigations to the Kubernetes cluster the alert is firing on.

Add the cluster label to your DataDog monitoring agent:

Getting API and Application KeysΒΆ

To use this toolset, you need both a Datadog API key and Application key:

  1. API Key: Go to Organization Settings > API Keys in your Datadog console

    • The API key must have the logs_read_data permission scope

    • When creating a new key, ensure this permission is enabled

  2. Application Key: Go to Organization Settings > Application Keys in your Datadog console

For more information, see the Datadog API documentation.

Configuring Site URLΒΆ

The site_api_url must match your Datadog site. Common values include:

  • https://api.datadoghq.com - US1

  • https://api.us3.datadoghq.com - US3

  • https://api.us5.datadoghq.com - US5

  • https://api.datadoghq.eu - EU

  • https://api.ap1.datadoghq.com - AP1

For a complete list of site URLs, see the Datadog site documentation.

Configuring Storage TiersΒΆ

Datadog offers different storage tiers for logs with varying retention and costs:

Storage Tier

Description

Use Case

indexes

Hot storage for recent logs (default)

Real-time analysis and alerting

online-archives

Warm storage for older logs

Historical log analysis

flex

Cost-effective storage

Long-term retention

The toolset uses storage tiers as a fallback mechanism. Subsequent tiers are queried only if the previous tier yielded no result. For example if the toolset is configured with storage_tiers ["indexes", "online-archives"], then:

  • Holmes first runs a query using storage_tier indexes

  • If there are no results at all, Holmes will then query online-archives

Handling Rate LimitsΒΆ

If you encounter rate limiting issues with Datadog (visible as warning messages in Holmes logs), you can adjust the following parameters:

  • page_size: Reduce this value to fetch fewer logs per API request. This helps avoid hitting rate limits on individual requests.

  • default_limit: Lower this value to reduce the total number of logs fetched when no explicit limit is specified.

Example configuration for rate-limited environments:

toolsets:
  datadog/logs:
    enabled: true
    config:
      page_size: 100        # Reduced from default 300
      default_limit: 500    # Reduced from default 1000

When rate limiting occurs, Holmes will automatically retry with exponential backoff. You'll see warnings like: DataDog logs toolset is rate limited/throttled. Waiting X.Xs until reset time

Configuring LabelsΒΆ

You can customize the labels used by the toolset to identify Kubernetes resources. This is optional and only needed if your Datadog logs use different field names than the defaults.

toolsets:
  datadog/logs:
    enabled: true
    config:
      labels:
        pod: "pod_name"        # The field name for Kubernetes pod name in your Datadog logs
        namespace: "kube_namespace"  # The field name for Kubernetes namespace in your Datadog logs

To find the correct field names in your Datadog logs:

  1. Go to Logs > Search in your Datadog console

  2. View a sample log entry

  3. Identify the field names used for pod name and namespace

  4. Update the labels configuration accordingly

Disabling the Default Logging ToolsetΒΆ

The default HolmesGPT logging tool must be disabled if you use a different datasource for logs. HolmesGPT may still use kubectl to fetch logs and never call your datasource if kubernetes/logs is not disabled. To disable the default logging toolset, add the following to your holmes configuration:

holmes:
  toolsets:
    kubernetes/logs:
      enabled: false

Update your Helm values (generated_values.yaml) with the above configuration and run a Helm upgrade:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Add the following to ~/.holmes/config.yaml, creating the file if it doesn't exist:

toolsets:
  kubernetes/logs:
    enabled: false

CapabilitiesΒΆ

The table below describes the specific capabilities provided by this toolset. HolmesGPT can decide to invoke any of these capabilities when answering questions or investigating issues.

Tool Name

Description

fetch_pod_logs

Retrieve logs from Datadog with support for filtering, time ranges, and multiple storage tiers