ClickHouse TTL not deleting data: why expired rows survive and the disk fills

A MergeTree table with a configured TTL can still accumulate disk usage when expired rows remain visible to SELECT. This is not a syntax or timezone error. ClickHouse enforces TTL only as a side effect of background merges. When merges stop or never start, TTL stops, and storage grows without warning.

What this means

ClickHouse evaluates TTL expressions only during merge operations. When the merge scheduler combines parts, it checks whether rows exceeded their TTL interval. Depending on table settings, it either drops the entire part if every row expired, or rewrites the part to exclude expired rows. Both paths require a merge. If the background merge pool is saturated, disk space is too low for temporary merge output, or a partition has no new inserts to trigger merge selection, expired rows remain indefinitely. The system does not log a warning for TTL skips; the only visible symptom is growing storage.

flowchart TD
    A[TTL rule expires rows] --> B{Merge scheduled?}
    B -->|No new inserts| C[Partition cold merges idle]
    B -->|Mutations or disk full| D[Merge starved]
    C --> E[Expired rows persist]
    D --> E
    E --> F[Disk fills silently]

Common causes

CauseWhat it looks likeFirst thing to check
Merge starvation from part explosionsystem.merges is empty or sluggish, DelayedInserts rises, and part counts climbSELECT count() FROM system.merges and per-partition part counts
Mutations monopolizing the merge poolsystem.mutations shows is_done = 0 and parts_to_do is flatSELECT database, table, parts_to_do FROM system.mutations WHERE is_done = 0
Disk too full for temporary merge outputsystem.disks.unreserved_space near zero and no merges runSELECT name, unreserved_space FROM system.disks
Cold partition with no insert trafficOld partition has many parts, no recent inserts, and TTL is never evaluatedSELECT partition_id, count() FROM system.parts WHERE active = 1 GROUP BY partition_id
ttl_only_drop_parts enabled on mixed dataParts contain both expired and live rows, so whole-part drop is blockedSHOW CREATE TABLE to verify the setting
TTL altered after data ingestionHistorical parts keep the old TTL rule; the new rule applies only to new mergesCheck whether the TTL expression was added after initial data load

Quick checks

-- Active merges and mutations
SELECT count() AS active_merges,
       countIf(is_mutation = 1) AS active_mutations
FROM system.merges;
-- Mutation backlog blocking merges
SELECT database, table, mutation_id, parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY parts_to_do DESC;
-- Disk headroom for merge output
SELECT name,
       formatReadableSize(free_space) AS free,
       formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;
-- Part counts by partition
SELECT database, table, partition_id,
       count() AS active_parts,
       formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 20;
-- Throttled or rejected inserts
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- Worst-case part concentration
SELECT value AS max_parts_per_partition
FROM system.asynchronous_metrics
WHERE metric = 'MaxPartCountForPartition';
# Recent merge and TTL entries in server logs
grep -Ei 'ttl|merge' /var/log/clickhouse-server/*.log | tail -50

How to diagnose it

  1. Verify the TTL expression. Run SHOW CREATE TABLE db.table. Confirm the TTL clause references the correct column and interval. A typo means rows never expire.
  2. Confirm merges are executing. Query system.merges. An empty result while part counts are high indicates merge starvation. If only is_mutation = 1 entries appear, mutations have crowded out regular merges.
  3. Check for stalled mutations. Query system.mutations WHERE is_done = 0. If parts_to_do is not decreasing, the mutation is stuck and monopolizing a merge thread.
  4. Inspect disk headroom. Query system.disks. If unreserved_space is near zero, ClickHouse halts merges to avoid running out of room for temporary output. Without merges, TTL cannot run.
  5. Identify cold partitions. Query system.parts grouped by partition_id. If old partitions have many active parts but receive no new inserts, the merge selector has no reason to schedule work there. Expired data persists until something triggers a merge.
  6. Review ttl_only_drop_parts. If enabled, ClickHouse drops an entire part only when every row inside it expired. Parts with mixed expired and live rows are left untouched. Verify whether your retention pattern matches whole-part expiry.
  7. Check for post-load TTL changes. If the TTL rule was added or modified after ingestion, existing parts may keep the old expiration logic. They adopt the new rule only after a merge rewrites them.
  8. Measure background pool saturation. Query system.metrics for BackgroundMergesAndMutationsPoolTask. If active tasks are near the pool size, merges are queued but not executing, which delays TTL.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Active parts per partitionHigh part counts signal merge backlog, which blocks TTLAny partition with parts steadily growing toward the insert-delay threshold
Merge activity (system.merges)Direct proof that TTL has an opportunity to runZero regular merges for more than 10 minutes while parts remain above baseline
Mutation queue (system.mutations)Mutations consume the same pool as mergesis_done = 0 with parts_to_do flat for over 30 minutes
Disk unreserved_spaceMerges need temporary spaceunreserved_space below twice the size of the largest active part
Background pool utilizationSaturated pools delay merge schedulingBackgroundMergesAndMutationsPoolTask near the pool size for over 15 minutes
DelayedInserts / RejectedInsertsPart pressure that precedes merge collapseRejectedInserts increasing, or DelayedInserts trending up
MaxPartCountForPartitionServer-wide worst-case concentrationValue climbing steadily over a 30-minute window

Fixes

Clear mutations blocking the merge pool

If system.mutations shows stuck mutations, kill non-critical ones to free merge threads.

KILL MUTATION WHERE database = 'db' AND table = 'table' AND mutation_id = '0000000000';

Tradeoff: Killed mutations leave partially rewritten parts in place. If you still need the change, reissue the mutation after recovery.

Reclaim disk space to resume merges

When unreserved_space nears zero, merges halt. Detach old or unneeded partitions to reclaim space immediately, or add capacity.

ALTER TABLE db.table DETACH PARTITION '2023-01-01';

Tradeoff: Detached partitions are invisible to queries until reattached. Use only for data you can temporarily exclude.

Trigger activity on cold partitions

If a partition has no new inserts, the merge scheduler may ignore it. Insert a small, valid batch into the partition to give the merge selector a reason to schedule work and evaluate TTL.

Tradeoff: This generates a new part. Ensure the system has enough merge capacity to handle it without worsening part count pressure.

Adjust ttl_only_drop_parts to match retention needs

If enabled and parts contain interleaved expired and live rows, ClickHouse cannot drop those parts. Disable it only if you need row-level TTL precision, but expect higher I/O from rewrite merges.

Tradeoff: Row-level TTL rewrites parts on disk, consuming CPU and I/O. Whole-part drops are cheap but require partitioning and TTL alignment.

Re-materialize after TTL rule changes

If the TTL was modified after ingestion, plan a window for background merges to rewrite historical parts, or accept that existing data expires under the old rule. Do not repeatedly ALTER TTL; each change queues new background work.

Prevention

  • Monitor part count per partition, not just per table. Merge and TTL crises happen at the partition level.
  • Keep disk usage below 80% and ensure unreserved_space stays above twice the size of the largest active part so merges have room to write output.
  • Avoid heavy ALTER UPDATE/DELETE during peak hours. Mutations starve merges and freeze TTL.
  • Batch inserts to keep part creation rate low. High part counts raise merge cost and delay TTL.
  • Align partition granularity with TTL intervals. If you TTL by month but partition by day, ttl_only_drop_parts may rarely fire because each part spans a shorter interval than the TTL.

How Netdata helps

  • Correlates MaxPartCountForPartition with OS-level disk I/O latency to expose merge starvation before TTL stalls.
  • Tracks MemoryTracking and system memory to identify memory pressure that pushes background merges into swap or cancellation.
  • Surfaces disk space from system.disks alongside OS volume metrics, highlighting when unreserved_space drops below safe merge headroom.
  • Charts insert latency and RejectedInserts spikes so you catch the merge debt spiral before parts accumulate enough to block TTL.
  • Distinguishes coordination problems (ZooKeeper/Keeper latency) from local merge problems by cross-referencing replication queue depth with local merge activity.