ClickHouse TTL not deleting data: why expired rows survive and the disk fills
A MergeTree table with a configured TTL can still accumulate disk usage when expired rows remain visible to SELECT. This is not a syntax or timezone error. ClickHouse enforces TTL only as a side effect of background merges. When merges stop or never start, TTL stops, and storage grows without warning.
What this means
ClickHouse evaluates TTL expressions only during merge operations. When the merge scheduler combines parts, it checks whether rows exceeded their TTL interval. Depending on table settings, it either drops the entire part if every row expired, or rewrites the part to exclude expired rows. Both paths require a merge. If the background merge pool is saturated, disk space is too low for temporary merge output, or a partition has no new inserts to trigger merge selection, expired rows remain indefinitely. The system does not log a warning for TTL skips; the only visible symptom is growing storage.
flowchart TD
A[TTL rule expires rows] --> B{Merge scheduled?}
B -->|No new inserts| C[Partition cold merges idle]
B -->|Mutations or disk full| D[Merge starved]
C --> E[Expired rows persist]
D --> E
E --> F[Disk fills silently]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Merge starvation from part explosion | system.merges is empty or sluggish, DelayedInserts rises, and part counts climb | SELECT count() FROM system.merges and per-partition part counts |
| Mutations monopolizing the merge pool | system.mutations shows is_done = 0 and parts_to_do is flat | SELECT database, table, parts_to_do FROM system.mutations WHERE is_done = 0 |
| Disk too full for temporary merge output | system.disks.unreserved_space near zero and no merges run | SELECT name, unreserved_space FROM system.disks |
| Cold partition with no insert traffic | Old partition has many parts, no recent inserts, and TTL is never evaluated | SELECT partition_id, count() FROM system.parts WHERE active = 1 GROUP BY partition_id |
ttl_only_drop_parts enabled on mixed data | Parts contain both expired and live rows, so whole-part drop is blocked | SHOW CREATE TABLE to verify the setting |
| TTL altered after data ingestion | Historical parts keep the old TTL rule; the new rule applies only to new merges | Check whether the TTL expression was added after initial data load |
Quick checks
-- Active merges and mutations
SELECT count() AS active_merges,
countIf(is_mutation = 1) AS active_mutations
FROM system.merges;
-- Mutation backlog blocking merges
SELECT database, table, mutation_id, parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY parts_to_do DESC;
-- Disk headroom for merge output
SELECT name,
formatReadableSize(free_space) AS free,
formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;
-- Part counts by partition
SELECT database, table, partition_id,
count() AS active_parts,
formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 20;
-- Throttled or rejected inserts
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- Worst-case part concentration
SELECT value AS max_parts_per_partition
FROM system.asynchronous_metrics
WHERE metric = 'MaxPartCountForPartition';
# Recent merge and TTL entries in server logs
grep -Ei 'ttl|merge' /var/log/clickhouse-server/*.log | tail -50
How to diagnose it
- Verify the TTL expression. Run
SHOW CREATE TABLE db.table. Confirm the TTL clause references the correct column and interval. A typo means rows never expire. - Confirm merges are executing. Query
system.merges. An empty result while part counts are high indicates merge starvation. If onlyis_mutation = 1entries appear, mutations have crowded out regular merges. - Check for stalled mutations. Query
system.mutations WHERE is_done = 0. Ifparts_to_dois not decreasing, the mutation is stuck and monopolizing a merge thread. - Inspect disk headroom. Query
system.disks. Ifunreserved_spaceis near zero, ClickHouse halts merges to avoid running out of room for temporary output. Without merges, TTL cannot run. - Identify cold partitions. Query
system.partsgrouped bypartition_id. If old partitions have many active parts but receive no new inserts, the merge selector has no reason to schedule work there. Expired data persists until something triggers a merge. - Review
ttl_only_drop_parts. If enabled, ClickHouse drops an entire part only when every row inside it expired. Parts with mixed expired and live rows are left untouched. Verify whether your retention pattern matches whole-part expiry. - Check for post-load TTL changes. If the TTL rule was added or modified after ingestion, existing parts may keep the old expiration logic. They adopt the new rule only after a merge rewrites them.
- Measure background pool saturation. Query
system.metricsforBackgroundMergesAndMutationsPoolTask. If active tasks are near the pool size, merges are queued but not executing, which delays TTL.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Active parts per partition | High part counts signal merge backlog, which blocks TTL | Any partition with parts steadily growing toward the insert-delay threshold |
Merge activity (system.merges) | Direct proof that TTL has an opportunity to run | Zero regular merges for more than 10 minutes while parts remain above baseline |
Mutation queue (system.mutations) | Mutations consume the same pool as merges | is_done = 0 with parts_to_do flat for over 30 minutes |
Disk unreserved_space | Merges need temporary space | unreserved_space below twice the size of the largest active part |
| Background pool utilization | Saturated pools delay merge scheduling | BackgroundMergesAndMutationsPoolTask near the pool size for over 15 minutes |
DelayedInserts / RejectedInserts | Part pressure that precedes merge collapse | RejectedInserts increasing, or DelayedInserts trending up |
MaxPartCountForPartition | Server-wide worst-case concentration | Value climbing steadily over a 30-minute window |
Fixes
Clear mutations blocking the merge pool
If system.mutations shows stuck mutations, kill non-critical ones to free merge threads.
KILL MUTATION WHERE database = 'db' AND table = 'table' AND mutation_id = '0000000000';
Tradeoff: Killed mutations leave partially rewritten parts in place. If you still need the change, reissue the mutation after recovery.
Reclaim disk space to resume merges
When unreserved_space nears zero, merges halt. Detach old or unneeded partitions to reclaim space immediately, or add capacity.
ALTER TABLE db.table DETACH PARTITION '2023-01-01';
Tradeoff: Detached partitions are invisible to queries until reattached. Use only for data you can temporarily exclude.
Trigger activity on cold partitions
If a partition has no new inserts, the merge scheduler may ignore it. Insert a small, valid batch into the partition to give the merge selector a reason to schedule work and evaluate TTL.
Tradeoff: This generates a new part. Ensure the system has enough merge capacity to handle it without worsening part count pressure.
Adjust ttl_only_drop_parts to match retention needs
If enabled and parts contain interleaved expired and live rows, ClickHouse cannot drop those parts. Disable it only if you need row-level TTL precision, but expect higher I/O from rewrite merges.
Tradeoff: Row-level TTL rewrites parts on disk, consuming CPU and I/O. Whole-part drops are cheap but require partitioning and TTL alignment.
Re-materialize after TTL rule changes
If the TTL was modified after ingestion, plan a window for background merges to rewrite historical parts, or accept that existing data expires under the old rule. Do not repeatedly ALTER TTL; each change queues new background work.
Prevention
- Monitor part count per partition, not just per table. Merge and TTL crises happen at the partition level.
- Keep disk usage below 80% and ensure
unreserved_spacestays above twice the size of the largest active part so merges have room to write output. - Avoid heavy
ALTER UPDATE/DELETEduring peak hours. Mutations starve merges and freeze TTL. - Batch inserts to keep part creation rate low. High part counts raise merge cost and delay TTL.
- Align partition granularity with TTL intervals. If you TTL by month but partition by day,
ttl_only_drop_partsmay rarely fire because each part spans a shorter interval than the TTL.
How Netdata helps
- Correlates
MaxPartCountForPartitionwith OS-level disk I/O latency to expose merge starvation before TTL stalls. - Tracks
MemoryTrackingand system memory to identify memory pressure that pushes background merges into swap or cancellation. - Surfaces disk space from
system.disksalongside OS volume metrics, highlighting whenunreserved_spacedrops below safe merge headroom. - Charts insert latency and
RejectedInsertsspikes so you catch the merge debt spiral before parts accumulate enough to block TTL. - Distinguishes coordination problems (ZooKeeper/Keeper latency) from local merge problems by cross-referencing replication queue depth with local merge activity.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse disk space collapse: why merges need free space and how the spiral starts
- ClickHouse disk space monitoring: free_space, unreserved_space, and the 80% target
- ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish
- ClickHouse insert latency rising: the leading indicator of write-pipeline trouble
- ClickHouse cannot connect to ZooKeeper/Keeper: diagnosing the coordination layer
- ClickHouse Keeper latency high: the early warning before sessions expire
- ClickHouse Keeper saturation spiral: too many tables, DDL storms, and cluster freeze
- ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups







