Ceph

Plugin: go.d.plugin Module: ceph

Overview

This collector monitors the overall health status and performance of your Ceph clusters. It gathers key metrics for the entire cluster, individual Pools, and OSDs.

It collects metrics by periodically issuing HTTP GET requests to the Ceph Manager REST API:

/api/monitor (only once to get the Ceph cluster id (fsid))
/api/health/minimal
/api/osd
/api/pool?stats=true

This collector is only supported on the following platforms:

Linux

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

The collector can automatically detect Ceph Manager instances running on:

localhost that are listening on port 8443
within Docker containers

Note that the Ceph REST API requires a username and password. While Netdata can automatically detect Ceph Manager instances and create data collection jobs, these jobs will fail unless you provide the necessary credentials.

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

You can configure the ceph collector in two ways:

Method	Best for	How to
UI	Fast setup without editing files	Go to Nodes → Configure this node → Collectors → Jobs, search for ceph, then click + to add a job.
File	If you prefer configuring via file, or need to automate deployments (e.g., with Ansible)	Edit `go.d/ceph.conf` and add a job.

:::important

UI configuration requires paid Netdata Cloud plan.

:::

Prerequisites

No action required.

Configuration

Options

The following options can be defined globally: update_every.

Group	Option	Description	Default	Required
Collection	update_every	Data collection interval (seconds).	1	no
	autodetection_retry	Autodetection retry interval (seconds). Set 0 to disable.	0	no
Target	url	The URL of the Ceph Manager API.	https://127.0.0.1:8443	yes
	timeout	HTTP request timeout (seconds).	2	no
HTTP Auth	username	Username for Basic HTTP authentication.		yes
	password	Password for Basic HTTP authentication.		yes
	bearer_token_file	Path to a file containing a bearer token (used for `Authorization: Bearer`).		no
TLS	tls_skip_verify	Skip TLS certificate and hostname verification (insecure).	yes	no
	tls_ca	Path to CA bundle used to validate the server certificate.		no
	tls_cert	Path to client TLS certificate (for mTLS).		no
	tls_key	Path to client TLS private key (for mTLS).		no
Proxy	proxy_url	HTTP proxy URL.		no
	proxy_username	Username for proxy Basic HTTP authentication.		no
	proxy_password	Password for proxy Basic HTTP authentication.		no
Request	method	HTTP method to use.	GET	no
	body	Request body (e.g., for POST/PUT).		no
	headers	Additional HTTP headers (one per line as key: value).		no
	not_follow_redirects	Do not follow HTTP redirects.	no	no
	force_http2	Force HTTP/2 (including h2c over TCP).	no	no
Virtual Node	vnode	Associates this data collection job with a Virtual Node.		no

via UI

Configure the ceph collector from the Netdata web interface:

Go to Nodes.
Select the node where you want the ceph data-collection job to run and click the :gear: (Configure this node). That node will run the data collection.
The Collectors → Jobs view opens by default.
In the Search box, type ceph (or scroll the list) to locate the ceph collector.
Click the + next to the ceph collector to add a new job.
Fill in the job fields, then click Test to verify the configuration and Submit to save.
- Test runs the job with the provided settings and shows whether data can be collected.
- If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/ceph.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/ceph.conf

Examples

Basic

A basic example configuration.

jobs:
  - name: local
    url: https://127.0.0.1:8443
    username: user
    password: pass

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

jobs:
  - name: local
    url: https://127.0.0.1:8443
    username: user
    password: pass

  - name: remote
    url: https://192.0.2.1:8443
    username: user
    password: pass

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per cluster

These metrics refer to the entire Ceph cluster.

Labels:

Label	Description
fsid	A unique identifier of the cluster.

Metrics:

Metric	Dimensions	Unit
ceph.cluster_status	ok, err, warn	status
ceph.cluster_hosts_count	hosts	hosts
ceph.cluster_monitors_count	monitors	monitors
ceph.cluster_osds_count	osds	osds
ceph.cluster_osds_by_status_count	up, down, in, out	status
ceph.cluster_managers_count	active, standby	managers
ceph.cluster_object_gateways_count	object	gateways
ceph.cluster_iscsi_gateways_count	iscsi	gateways
ceph.cluster_iscsi_gateways_by_status_count	up, down	gateways
ceph.cluster_physical_capacity_utilization	utilization	percent
ceph.cluster_physical_capacity_usage	avail, used	bytes
ceph.cluster_objects_count	objects	objects
ceph.cluster_objects_by_status_distribution	healthy, misplaced, degraded, unfound	percent
ceph.cluster_pools_count	pools	pools
ceph.cluster_pgs_count	pgs	pgs
ceph.cluster_pgs_by_status_count	clean, working, warning, unknown	pgs
ceph.cluster_pgs_per_osd_count	per_osd	pgs

Per osd

These metrics refer to the Object Storage Daemon (OSD).

Labels:

Label	Description
fsid	A unique identifier of the cluster.
osd_uuid	OSD UUID.
osd_name	OSD name.
device_class	OSD device class.

Metrics:

Metric	Dimensions	Unit
ceph.osd_status	up, down, in, out	status
ceph.osd_space_usage	avail, used	bytes
ceph.osd_io	read, written	bytes/s
ceph.osd_iops	read, write	ops/s
ceph.osd_latency	commit, apply	milliseconds

Per pool

These metrics refer to the Pool.

Labels:

Label	Description
fsid	A unique identifier of the cluster.
pool_name	Pool name.

Metrics:

Metric	Dimensions	Unit
ceph.pool_space_utilization	utilization	percent
ceph.pool_space_usage	avail, used	bytes
ceph.pool_objects_count	object	objects
ceph.pool_io	read, written	bytes/s
ceph.pool_iops	read, write	ops/s

Alerts

The following alerts are available:

Alert name	On metric	Description
ceph_cluster_physical_capacity_utilization	ceph.cluster_physical_capacity_utilization	Ceph cluster ${label:fsid} disk space utilization

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the ceph collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn’t working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that’s not the case on your system, open netdata.conf and look for the plugins setting under [directories].
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
```
sudo -u netdata -s
```
Run the go.d.plugin to debug the collector:
```
./go.d.plugin -d -m ceph
```
To debug a specific job:
```
./go.d.plugin -d -m ceph -j jobName
```

Getting Logs

If you’re encountering problems with the ceph collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep ceph

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector’s name:

grep ceph /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named “netdata” (replace if different), use this command:

docker logs netdata 2>&1 | grep ceph

The only agent that thinks for itself

Centralized metrics streaming and storage

Fully managed cloud platform

Deploy Netdata Cloud in your infrastructure

Powerful, intuitive monitoring interface

Monitor on the go

80% Faster Incident Resolution

True Real-Time and Simple, even at Scale

90% Cost Reduction, Full Fidelity

Single Pane of Glass

Control Without Surrender

Integrations

Built for the People Who Get Paged

Every Industry Has Rules. We Master Them.

Monitor Any Technology. Configure Nothing.

Complete Visibility. Total Control.

Don't Take Our Word for It

Falkland Islands Government

TMB Barcelona

Nodecraft

Codyas

Pay per Node. Unlimited Everything Else.

What's Your Monitoring Really Costing You?

Your Infrastructure Is Unique. Let's Talk.

Monitoring That Sells Itself

Per-Second Metrics at Homelab Prices

$1,000 Per Referral. Unlimited Referrals.

Engineering Insights & Product Updates

Introducing Real-Time Conversations with …

Text-to-Alert: Generating Netdata Alerts …

Monitor Everything is an Anti-Pattern!

Netdata at Gartner IOCS 2025: Tackling …

Never Fight Fires Alone

60 Seconds to First Dashboard

See Netdata in Action

Level Up Your Monitoring

76,000+ Engineers Strong

Per-Second. 90% Cheaper. Data Stays Home.

Ceph

Ceph

Overview

Default Behavior

Auto-Detection

Limits

Performance Impact

Setup

Prerequisites

Configuration

Options

via UI

via File

Examples

Basic

Multi-instance

Metrics

Per cluster

Per osd

Per pool

Alerts

Troubleshooting

Debug Mode

Getting Logs

System with systemd

System without systemd

Docker Container

The observability platform companies need to succeed