Consul icon

Consul

Consul

Plugin: go.d.plugin Module: consul

Overview

This collector monitors key metrics of Consul Agents: transaction timings, leadership changes, memory usage and more.

It periodically sends HTTP requests to Consul REST API.

Used endpoints:

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

This collector discovers instances running on the local host, that provide metrics on port 8500.

On startup, it tries to collect metrics from:

  • http://localhost:8500
  • http://127.0.0.1:8500

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

You can configure the consul collector in two ways:

Method Best for How to
UI Fast setup without editing files Go to Nodes → Configure this node → Collectors → Jobs, search for consul, then click + to add a job.
File If you prefer configuring via file, or need to automate deployments (e.g., with Ansible) Edit go.d/consul.conf and add a job.

:::important

UI configuration requires paid Netdata Cloud plan.

:::

Prerequisites

Enable Prometheus telemetry

Enable telemetry on your Consul Agent, by increasing the value of prometheus_retention_time from 0.

Add required ACLs to Token

Required only if authentication is enabled.

ACL Endpoint
operator:read autopilot health status
node:read checks
agent:read configuration, metrics, and lan coordinates

Configuration

Options

The following options can be defined globally: update_every, autodetection_retry.

Group Option Description Default Required
Collection update_every Data collection interval (seconds). 1 no
autodetection_retry Autodetection retry interval (seconds). Set 0 to disable. 0 no
Target url Consul HTTP API URL. http://localhost:8500 yes
timeout HTTP request timeout (seconds). 1 no
HTTP Auth acl_token Consul ACL token sent with every request (X-Consul-Token header). no
username Username for Basic HTTP authentication. no
password Password for Basic HTTP authentication. no
bearer_token_file Path to a file containing a bearer token (used for Authorization: Bearer). no
TLS tls_skip_verify Skip TLS certificate and hostname verification (insecure). no no
tls_ca Path to CA bundle used to validate the server certificate. no
tls_cert Path to client TLS certificate (for mTLS). no
tls_key Path to client TLS private key (for mTLS). no
Proxy proxy_url HTTP proxy URL. no
proxy_username Username for proxy Basic HTTP authentication. no
proxy_password Password for proxy Basic HTTP authentication. no
Request method HTTP method to use. GET no
body Request body (e.g., for POST/PUT). no
headers Additional HTTP headers (one per line as key: value). no
not_follow_redirects Do not follow HTTP redirects. no no
force_http2 Force HTTP/2 (including h2c over TCP). no no
Virtual Node vnode Associates this data collection job with a Virtual Node. no

via UI

Configure the consul collector from the Netdata web interface:

  1. Go to Nodes.
  2. Select the node where you want the consul data-collection job to run and click the :gear: (Configure this node). That node will run the data collection.
  3. The Collectors → Jobs view opens by default.
  4. In the Search box, type consul (or scroll the list) to locate the consul collector.
  5. Click the + next to the consul collector to add a new job.
  6. Fill in the job fields, then click Test to verify the configuration and Submit to save.
    • Test runs the job with the provided settings and shows whether data can be collected.
    • If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/consul.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/consul.conf
Examples
Basic

An example configuration.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

Basic HTTP auth

Local server with basic HTTP authentication.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"
    username: foo
    password: bar

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

  - name: remote
    url: http://203.0.113.10:8500
    acl_token: "ada7f751-f654-8872-7f93-498e799158b6"

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

The set of metrics depends on the Consul Agent mode.

Per Consul instance

These metrics refer to the entire monitored application.

This scope has no labels.

Metrics:

Metric Dimensions Unit Leader Follower Client
consul.client_rpc_requests_rate rpc requests/s
consul.client_rpc_requests_exceeded_rate exceeded requests/s
consul.client_rpc_requests_failed_rate failed requests/s
consul.memory_allocated allocated bytes
consul.memory_sys sys bytes
consul.gc_pause_time gc_pause seconds
consul.kvs_apply_time quantile_0.5, quantile_0.9, quantile_0.99 ms
consul.kvs_apply_operations_rate kvs_apply ops/s
consul.txn_apply_time quantile_0.5, quantile_0.9, quantile_0.99 ms
consul.txn_apply_operations_rate txn_apply ops/s
consul.autopilot_health_status healthy, unhealthy status
consul.autopilot_failure_tolerance failure_tolerance servers
consul.autopilot_server_health_status healthy, unhealthy status
consul.autopilot_server_stable_time stable seconds
consul.autopilot_server_serf_status active, failed, left, none status
consul.autopilot_server_voter_status voter, not_voter status
consul.network_lan_rtt min, max, avg ms
consul.raft_commit_time quantile_0.5, quantile_0.9, quantile_0.99 ms
consul.raft_commits_rate commits commits/s
consul.raft_leader_last_contact_time quantile_0.5, quantile_0.9, quantile_0.99 ms
consul.raft_leader_oldest_log_age oldest_log_age seconds
consul.raft_follower_last_contact_leader_time leader_last_contact ms
consul.raft_rpc_install_snapshot_time quantile_0.5, quantile_0.9, quantile_0.99 ms
consul.raft_leader_elections_rate leader elections/s
consul.raft_leadership_transitions_rate leadership transitions/s
consul.server_leadership_status leader, not_leader status
consul.raft_thread_main_saturation_perc quantile_0.5, quantile_0.9, quantile_0.99 percentage
consul.raft_thread_fsm_saturation_perc quantile_0.5, quantile_0.9, quantile_0.99 percentage
consul.raft_fsm_last_restore_duration last_restore_duration ms
consul.raft_boltdb_freelist_bytes freelist bytes
consul.raft_boltdb_logs_per_batch_rate written logs/s
consul.raft_boltdb_store_logs_time quantile_0.5, quantile_0.9, quantile_0.99 ms
consul.license_expiration_time license_expiration seconds

Per node check

Metrics about checks on Node level.

Labels:

Label Description
datacenter Datacenter Identifier
node_name The node’s name
check_name The check’s name

Metrics:

Metric Dimensions Unit Leader Follower Client
consul.node_health_check_status passing, maintenance, warning, critical status

Per service check

Metrics about checks at a Service level.

Labels:

Label Description
datacenter Datacenter Identifier
node_name The node’s name
check_name The check’s name
service_name The service’s name

Metrics:

Metric Dimensions Unit Leader Follower Client
consul.service_health_check_status passing, maintenance, warning, critical status

Alerts

The following alerts are available:

Alert name On metric Description
consul_node_health_check_status consul.node_health_check_status node health check ${label:check_name} has failed on server ${label:node_name} datacenter ${label:datacenter}
consul_service_health_check_status consul.service_health_check_status service health check ${label:check_name} for service ${label:service_name} has failed on server ${label:node_name} datacenter ${label:datacenter}
consul_client_rpc_requests_exceeded consul.client_rpc_requests_exceeded_rate number of rate-limited RPC requests made by server ${label:node_name} datacenter ${label:datacenter}
consul_client_rpc_requests_failed consul.client_rpc_requests_failed_rate number of failed RPC requests made by server ${label:node_name} datacenter ${label:datacenter}
consul_gc_pause_time consul.gc_pause_time time spent in stop-the-world garbage collection pauses on server ${label:node_name} datacenter ${label:datacenter}
consul_autopilot_health_status consul.autopilot_health_status datacenter ${label:datacenter} cluster is unhealthy as reported by server ${label:node_name}
consul_autopilot_server_health_status consul.autopilot_server_health_status server ${label:node_name} from datacenter ${label:datacenter} is unhealthy
consul_raft_leader_last_contact_time consul.raft_leader_last_contact_time median time elapsed since leader server ${label:node_name} datacenter ${label:datacenter} was last able to contact the follower nodes
consul_raft_leadership_transitions consul.raft_leadership_transitions_rate there has been a leadership change and server ${label:node_name} datacenter ${label:datacenter} has become the leader
consul_raft_thread_main_saturation consul.raft_thread_main_saturation_perc average saturation of the main Raft goroutine on server ${label:node_name} datacenter ${label:datacenter}
consul_raft_thread_fsm_saturation consul.raft_thread_fsm_saturation_perc average saturation of the FSM Raft goroutine on server ${label:node_name} datacenter ${label:datacenter}
consul_license_expiration_time consul.license_expiration_time Consul Enterprise licence expiration time on node ${label:node_name} datacenter ${label:datacenter}

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the consul collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn’t working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that’s not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
    
  • Switch to the netdata user.

    sudo -u netdata -s
    
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m consul
    

    To debug a specific job:

    ./go.d.plugin -d -m consul -j jobName
    

Getting Logs

If you’re encountering problems with the consul collector, follow these steps to retrieve logs and identify potential issues:

  • Run the command specific to your system (systemd, non-systemd, or Docker container).
  • Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep consul

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector’s name:

grep consul /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named “netdata” (replace if different), use this command:

docker logs netdata 2>&1 | grep consul

The observability platform companies need to succeed

Sign up for free

Want a personalised demo of Netdata for your use case?

Book a Demo