Ceph icon

Ceph

Ceph

Plugin: go.d.plugin Module: ceph

Overview

This collector monitors the overall health status and performance of your Ceph clusters. It gathers key metrics for the entire cluster, individual Pools, and OSDs.

It collects metrics by periodically issuing HTTP GET requests to the Ceph Manager REST API:

This collector is only supported on the following platforms:

  • Linux

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

The collector can automatically detect Ceph Manager instances running on:

  • localhost that are listening on port 8443
  • within Docker containers

Note that the Ceph REST API requires a username and password. While Netdata can automatically detect Ceph Manager instances and create data collection jobs, these jobs will fail unless you provide the necessary credentials.

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

You can configure the ceph collector in two ways:

Method Best for How to
UI Fast setup without editing files Go to Nodes → Configure this node → Collectors → Jobs, search for ceph, then click + to add a job.
File If you prefer configuring via file, or need to automate deployments (e.g., with Ansible) Edit go.d/ceph.conf and add a job.

:::important

UI configuration requires paid Netdata Cloud plan.

:::

Prerequisites

No action required.

Configuration

Options

The following options can be defined globally: update_every.

Group Option Description Default Required
Collection update_every Data collection interval (seconds). 1 no
autodetection_retry Autodetection retry interval (seconds). Set 0 to disable. 0 no
Target url The URL of the Ceph Manager API. https://127.0.0.1:8443 yes
timeout HTTP request timeout (seconds). 2 no
HTTP Auth username Username for Basic HTTP authentication. yes
password Password for Basic HTTP authentication. yes
bearer_token_file Path to a file containing a bearer token (used for Authorization: Bearer). no
TLS tls_skip_verify Skip TLS certificate and hostname verification (insecure). yes no
tls_ca Path to CA bundle used to validate the server certificate. no
tls_cert Path to client TLS certificate (for mTLS). no
tls_key Path to client TLS private key (for mTLS). no
Proxy proxy_url HTTP proxy URL. no
proxy_username Username for proxy Basic HTTP authentication. no
proxy_password Password for proxy Basic HTTP authentication. no
Request method HTTP method to use. GET no
body Request body (e.g., for POST/PUT). no
headers Additional HTTP headers (one per line as key: value). no
not_follow_redirects Do not follow HTTP redirects. no no
force_http2 Force HTTP/2 (including h2c over TCP). no no
Virtual Node vnode Associates this data collection job with a Virtual Node. no

via UI

Configure the ceph collector from the Netdata web interface:

  1. Go to Nodes.
  2. Select the node where you want the ceph data-collection job to run and click the :gear: (Configure this node). That node will run the data collection.
  3. The Collectors → Jobs view opens by default.
  4. In the Search box, type ceph (or scroll the list) to locate the ceph collector.
  5. Click the + next to the ceph collector to add a new job.
  6. Fill in the job fields, then click Test to verify the configuration and Submit to save.
    • Test runs the job with the provided settings and shows whether data can be collected.
    • If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/ceph.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/ceph.conf
Examples
Basic

A basic example configuration.

jobs:
  - name: local
    url: https://127.0.0.1:8443
    username: user
    password: pass

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

jobs:
  - name: local
    url: https://127.0.0.1:8443
    username: user
    password: pass

  - name: remote
    url: https://192.0.2.1:8443
    username: user
    password: pass

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per cluster

These metrics refer to the entire Ceph cluster.

Labels:

Label Description
fsid A unique identifier of the cluster.

Metrics:

Metric Dimensions Unit
ceph.cluster_status ok, err, warn status
ceph.cluster_hosts_count hosts hosts
ceph.cluster_monitors_count monitors monitors
ceph.cluster_osds_count osds osds
ceph.cluster_osds_by_status_count up, down, in, out status
ceph.cluster_managers_count active, standby managers
ceph.cluster_object_gateways_count object gateways
ceph.cluster_iscsi_gateways_count iscsi gateways
ceph.cluster_iscsi_gateways_by_status_count up, down gateways
ceph.cluster_physical_capacity_utilization utilization percent
ceph.cluster_physical_capacity_usage avail, used bytes
ceph.cluster_objects_count objects objects
ceph.cluster_objects_by_status_distribution healthy, misplaced, degraded, unfound percent
ceph.cluster_pools_count pools pools
ceph.cluster_pgs_count pgs pgs
ceph.cluster_pgs_by_status_count clean, working, warning, unknown pgs
ceph.cluster_pgs_per_osd_count per_osd pgs

Per osd

These metrics refer to the Object Storage Daemon (OSD).

Labels:

Label Description
fsid A unique identifier of the cluster.
osd_uuid OSD UUID.
osd_name OSD name.
device_class OSD device class.

Metrics:

Metric Dimensions Unit
ceph.osd_status up, down, in, out status
ceph.osd_space_usage avail, used bytes
ceph.osd_io read, written bytes/s
ceph.osd_iops read, write ops/s
ceph.osd_latency commit, apply milliseconds

Per pool

These metrics refer to the Pool.

Labels:

Label Description
fsid A unique identifier of the cluster.
pool_name Pool name.

Metrics:

Metric Dimensions Unit
ceph.pool_space_utilization utilization percent
ceph.pool_space_usage avail, used bytes
ceph.pool_objects_count object objects
ceph.pool_io read, written bytes/s
ceph.pool_iops read, write ops/s

Alerts

The following alerts are available:

Alert name On metric Description
ceph_cluster_physical_capacity_utilization ceph.cluster_physical_capacity_utilization Ceph cluster ${label:fsid} disk space utilization

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the ceph collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn’t working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that’s not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
    
  • Switch to the netdata user.

    sudo -u netdata -s
    
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m ceph
    

    To debug a specific job:

    ./go.d.plugin -d -m ceph -j jobName
    

Getting Logs

If you’re encountering problems with the ceph collector, follow these steps to retrieve logs and identify potential issues:

  • Run the command specific to your system (systemd, non-systemd, or Docker container).
  • Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep ceph

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector’s name:

grep ceph /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named “netdata” (replace if different), use this command:

docker logs netdata 2>&1 | grep ceph

The observability platform companies need to succeed

Sign up for free

Want a personalised demo of Netdata for your use case?

Book a Demo