ClickHouse No space left on device: emergency recovery when the data disk fills

When ClickHouse returns No space left on device to client inserts or the server log fills with write errors, the situation is past a simple capacity alert. ClickHouse does not degrade gradually on a full disk. Background merges halt immediately because they require temporary free space to write combined parts before removing source files. Once merges stop, small insert parts accumulate, metadata overhead grows, and disk usage accelerates. TTL-based expiration also stops because TTL cleanup is executed by merges. ClickHouse system tables such as system.query_log and system.part_log are MergeTree tables that can grow unbounded and consume the remaining space.

Recovery requires manual intervention. The goal is to reclaim enough space for merges to resume, identify what consumed the disk, and fix the root cause before the next cycle. Do not restart the server as a first step. A restart with a full disk can prevent ClickHouse from starting if it needs to write system tables or temporary files during initialization.

flowchart TD
    A[Disk > 85% full] --> B[Merges cannot allocate temp space]
    B --> C[Merges halt]
    C --> D[Small parts accumulate]
    D --> E[Metadata and indexes grow]
    E --> F[TTL cleanup stops]
    F --> G[Disk usage accelerates]
    G --> H[Inserts fail with No space left]

What this means

ClickHouse stores data in immutable parts. The merge engine continuously combines smaller parts into larger ones to maintain query performance and control file descriptor usage. A merge reads all source part files and writes a new merged part to the same disk before deleting the sources. A single merge temporarily requires space for both source and output parts.

When the disk crosses the threshold where the largest potential merge cannot be completed, ClickHouse stops scheduling new merges. Existing merges may stall or fail. Without merges, every insert creates a new part that persists indefinitely. The part count rises, increasing files, index entries, and memory structures. This metadata growth itself consumes additional disk space.

TTL relies on merges to remove expired rows, so old data stays on disk. If the volume also hosts ZooKeeper transaction logs, Keeper snapshots, or application logs, those can tip the system from “nearly full” to “hard stop.” Inserts fail, read queries may fail if they require temporary space, and the server can crash if it cannot write logs or system tables.

Common causes

CauseWhat it looks likeFirst thing to check
Ingestion outpacing TTL or retentionDisk usage grows steadily day over day; old monthly partitions are still present and large.system.parts for partition ages and sizes.
Merge death spiral from prior pressureDisk is above 85%, active parts per partition are climbing, and system.merges is empty.Merge activity and part count trends together.
Mutation backlog creating temporary copiesDisk usage is high despite flat ingestion; system.mutations shows long-running or stuck jobs.Pending mutations with is_done = 0.
System tables growing unboundedsystem.query_log or system.part_log consumes tens or hundreds of gigabytes.bytes_on_disk grouped by database and table, including system tables.
Detached parts not cleaned upUsed disk space exceeds the sum of active parts; old detached directories linger.system.detached_parts or the detached/ directories on disk.
Non-ClickHouse files on the data volumedf shows less free space than system.disks.free_space suggests.OS-level disk usage outside of ClickHouse data paths.

Quick checks

Run these read-only queries and commands in order.

# OS-level disk usage on the data mount
df -h /var/lib/clickhouse
-- ClickHouse's view of disk space and reservation
SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    round(100 * (1 - free_space / total_space), 1) AS used_pct,
    formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;
-- Largest partitions by on-disk size
SELECT
    database,
    table,
    partition_id,
    count() AS active_parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY size DESC
LIMIT 20;
-- Are merges running, or is the pipeline frozen?
SELECT
    count(*) AS active_merges,
    countIf(is_mutation = 1) AS mutations,
    countIf(is_mutation = 0) AS regular_merges
FROM system.merges;
-- Pending mutations that could be blocking merges and consuming space
SELECT
    database,
    table,
    mutation_id,
    command,
    create_time,
    is_done,
    parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
-- Space consumed by system tables themselves
SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) AS disk_size,
    count() AS parts
FROM system.parts
WHERE active = 1
    AND database = 'system'
GROUP BY database, table
ORDER BY disk_size DESC
LIMIT 10;
-- Detached parts that still occupy disk but are invisible to queries
SELECT
    database,
    table,
    name,
    reason,
    formatReadableSize(bytes_on_disk) AS size,
    modification_time
FROM system.detached_parts
ORDER BY bytes_on_disk DESC
LIMIT 20;
-- Confirm whether inserts are being delayed or rejected
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');

How to diagnose it

  1. Quantify the gap. Compare OS df output against system.disks. If the OS shows significantly less free space than ClickHouse reports, the difference is consumed by non-ClickHouse files, detached parts, or filesystem overhead. Check for core dumps, package caches, or oversized logs.

  2. Find the largest consumers. Rank partitions by bytes_on_disk using the query above. Focus on the top three. In most emergencies, one or two large tables or old partitions dominate.

  3. Determine if merges are blocked. If system.merges returns zero rows while active parts are high, the merge pipeline is stalled. Check system.mutations next. Pending mutations can hold merge threads and temporarily duplicate parts.

  4. Check system table bloat. If system.query_log, system.part_log, or system.text_log rank high in the size query, the monitoring instrumentation itself is contributing to the emergency. This is common on high-QPS clusters where TTL was never configured for log tables.

  5. Assess detached parts. system.detached_parts shows parts that were removed from active service but not deleted. They continue to consume space. Large counts here usually follow failed ALTER operations, replication fetch failures, or manual DETACH commands that were never cleaned up.

  6. Verify TTL health. If TTL is configured but expired data remains, confirm merges are running. TTL only drops data during merges. A full disk blocks merges, which blocks TTL, which prevents space reclamation.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Disk free spaceMerges need temporary headroom equal to the largest source parts.Free space below 20% or unreserved_space approaching zero.
Active part count per partitionParts grow when merges halt, accelerating metadata overhead.Count rising while merge activity is zero.
Merge activityConfirms the background pipeline is actually making progress.No entries in system.merges for more than 10 minutes during active inserts.
Mutation queue depthMutations block merges and temporarily double part sizes.Any is_done = 0 mutation with parts_to_do not decreasing over time.
System table sizesquery_log and part_log are MergeTree tables that can fill the disk.Any system table larger than 10% of total data size.
Insert delays and rejectionsEarly indicators that the part count or disk pressure threshold is crossed.DelayedInserts or RejectedInserts counters increasing.

Fixes

Immediate stabilization: stop the bleeding

Stop or throttle ingestors before deleting anything. Each insert creates a new part that adds metadata overhead. If replication is active, pause non-essential distributed sends and DDL operations. Do not restart ClickHouse yet; a restart with zero free space can fail during initialization.

Reclaim space from old partitions

Detach old partitions to recover space without destroying data permanently. Detached partitions are no longer visible to queries but remain on disk in the detached/ directory. They can be reattached later once space is available.

-- Detach a specific old partition (safe, reversible)
ALTER TABLE db.table DETACH PARTITION ID '202401';

To permanently delete a partition and immediately reclaim space, use DROP PARTITION. This is destructive and irreversible.

-- Permanently drop a partition (destructive)
ALTER TABLE db.table DROP PARTITION ID '202401';

After detaching or dropping, run the system.disks check again to confirm space was freed.

Unblock system tables

If system.query_log, system.part_log, or similar tables are consuming significant space, truncate them for immediate relief. This destroys log history, so extract any needed forensic data first.

-- Truncate a system log table (destructive to history)
TRUNCATE TABLE system.query_log;

For long-term prevention, configure TTL on these tables in the server configuration so they do not grow unbounded. The change requires a server restart to apply.

Clean up detached parts

If detached parts from past operations are consuming space and are no longer needed, remove them from the detached/ directory. This is destructive; the data cannot be recovered without restoring from backup. Paths vary by database engine and configuration; verify the exact location before running destructive commands.

# Remove detached parts for a specific table (destructive)
rm -rf /var/lib/clickhouse/data/db/table/detached/*

Prefer querying system.detached_parts first to document what will be removed.

Cancel mutations that are doubling disk usage

If a long-running mutation is holding merge threads and creating temporary part copies, kill it. The mutation must be reissued later, but cancellation frees threads and stops temporary space consumption.

-- Kill a specific mutation
KILL MUTATION WHERE database = 'db' AND table = 'table' AND mutation_id = '0000000000';

Restore merge activity

Once free space is above the merge threshold, verify that merges resume:

-- Check for new merge activity
SELECT * FROM system.merges LIMIT 5;

Monitor system.parts over the next 10 to 30 minutes. Active part counts should begin trending downward. If parts do not decrease, investigate whether the background pool is saturated or whether I/O latency is preventing merge progress.

Structural fixes

If disk expansion is the only option, add space to the underlying volume or move older tables to tiered storage. Do not rely on TTL alone to save you in the immediate emergency; TTL requires merges, and merges require free space.

Prevention

  • Maintain merge headroom. Keep disk usage below 80% as an operational target. The safety margin must be at least 2x the size of the largest active partition to allow a full merge to complete.
  • Configure TTL on system tables. Set retention policies for system.query_log, system.part_log, system.text_log, and system.trace_log. On high-traffic clusters, these can outgrow user data.
  • Monitor the merge-to-insert ratio. Track part creation rate against merge completion rate. If parts are created faster than they are merged, part count and disk usage will grow until they trigger a crisis.
  • Audit partitioning strategy. Overly granular partitioning (for example, by hour) multiplies part counts and metadata overhead. Daily or monthly partitioning is usually more space-efficient.
  • Clean detached parts proactively. Include system.detached_parts in routine checks. Detached parts left after incidents or failed alters silently consume space indefinitely.
  • Set disk utilization alerts below the danger zone. Page when disk usage exceeds 80%, not at 95%. The extra runway is needed for merge temp space.

How Netdata helps

  • Real-time correlation of disk utilization, merge activity, and active part count exposes the merge death spiral before inserts fail.
  • Per-partition part counts and growth rates predict disk pressure hours before the filesystem fills.
  • DelayedInserts and RejectedInserts event counters alert on write-pipeline backpressure.
  • System table size monitoring alongside user data catches cases where query_log or part_log become the primary disk consumer.
  • Background pool utilization and mutation queue depth distinguish disk-full causes from merge thread starvation.