ClickHouse DelayedInserts climbing: the warning before too-many-parts
Insert latency is climbing and system.events.DelayedInserts is no longer flat. ClickHouse is sleeping during INSERT because at least one partition has crossed parts_to_delay_insert. The database still accepts writes, but injects a sleep before each insert commits. This is the warning window before hard failure. If the merge backlog is not resolved, DelayedInserts climbs until RejectedInserts starts ticking and clients receive DB::Exception: Too many parts.
DelayedInserts and RejectedInserts are cumulative counters in system.events. Alert on their rate of change, not absolute value. A sustained positive rate means the merge pipeline is losing to the insert pipeline.
What this means
Every INSERT into a MergeTree table creates one or more immutable data parts. ClickHouse runs background merges to consolidate smaller parts into larger ones. This keeps query performance stable and limits file descriptor usage. If parts accumulate faster than merges consolidate them, the active part count per partition rises.
When the active part count in a single partition exceeds parts_to_delay_insert (default 1000), ClickHouse throttles inserts into that partition by injecting a sleep. The delay grows as the part count rises further above the threshold. When the count exceeds parts_to_throw_insert (default 3000), inserts are rejected entirely. Both thresholds are configurable per table via SETTINGS in the CREATE TABLE statement or server defaults. There is also a global cap, max_parts_in_total (default 100000), which throws regardless of per-partition state.
DelayedInserts is the earliest operational signal that a partition is heading toward the hard limit. Insert latency is the client-visible symptom. Because these counters are cumulative since server start, monitor their rate. A flat value is healthy; a sustained positive rate means the throttle is active and the situation is worsening.
flowchart TD
A[INSERT arrives] --> B{Active parts in partition < parts_to_delay_insert?}
B -->|Yes| C[Insert commits normally]
B -->|No| D{Active parts < parts_to_throw_insert?}
D -->|Yes| E[Apply delay and increment DelayedInserts]
E --> F[Insert commits after sleep]
D -->|No| G[Reject insert and increment RejectedInserts]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Merge throughput below insert rate | system.merges shows few or very slow merges; part count rising steadily | SELECT database, table, partition_id, count() AS parts FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY parts DESC LIMIT 10 |
| Many small inserts | High InsertQuery count but low InsertedRows per query; each insert creates a new part | SELECT event, value FROM system.events WHERE event IN ('InsertQuery', 'InsertedRows') and compare ratios |
| Mutations blocking the merge pool | system.mutations shows is_done = 0 with parts_to_do stalled | SELECT database, table, mutation_id, parts_to_do FROM system.mutations WHERE is_done = 0 ORDER BY create_time |
| Over-partitioning | A single table has dozens or hundreds of partition_ids, each accumulating parts | Per-partition part count query above; look for many partition_ids under one table |
| Projections or materialized views multiplying parts | Base table looks healthy but hidden parts or downstream tables are accumulating | Check part counts in materialized view target tables; for projections, compare total active parts against expected base table parts |
Quick checks
Run these read-only checks to confirm the scope and locate the bottleneck.
# Check current DelayedInserts value (sample twice to compute delta)
clickhouse-client -q "SELECT event, value FROM system.events WHERE event = 'DelayedInserts'"
-- Find the worst partition and how close it is to limits
SELECT
database,
table,
partition_id,
count() AS active_parts,
1000 AS delay_threshold,
3000 AS throw_threshold
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 10;
-- Check currently running merges and their progress
SELECT
database,
table,
elapsed,
progress,
num_parts,
is_mutation,
formatReadableSize(total_size_bytes_compressed) AS merge_size
FROM system.merges
ORDER BY elapsed DESC;
-- Check for active mutations that may be starving merges
SELECT
database,
table,
mutation_id,
parts_to_do,
latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
-- Evaluate insert batching efficiency
SELECT
event,
value
FROM system.events
WHERE event IN ('InsertQuery', 'InsertedRows');
-- Check disk space available for merges to complete
SELECT
name,
path,
formatReadableSize(free_space) AS free,
formatReadableSize(total_space) AS total,
round(100 * free_space / total_space, 1) AS free_pct
FROM system.disks;
How to diagnose it
- Confirm the symptom is current.
DelayedInsertsis cumulative since server start. Take two samples 60 seconds apart and compute the delta. A sustained rate above zero is the signal. - Identify the affected partition. The limit is per-partition, not per-table. Use the per-partition part count query to find the exact table and
partition_idthat crossed the threshold. - Determine if merges are running. Query
system.merges. If merges are active andprogressis advancing, the system may be catching up from a burst. If no merges are running despite high part counts, the merge pool may be blocked or the disk may be too full to allow merge output. - Check for merge blockers. Look at
system.mutationsfor long-running or stuck mutations. Look atsystem.disksfor free space below 20%. Look at background pool utilization insystem.metrics. - Correlate with insert patterns. Compare
InsertQuerytoInsertedRows. If the ratio shows many small inserts, the root cause is client-side batching. If insert volume jumped recently, the root cause is insufficient merge capacity for the new insert rate. - Check for hidden parts. If the base table part count is moderate but the throttle is firing, verify whether projections or materialized views are creating hidden parts that also count toward limits.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
DelayedInserts rate | Direct measure of insert throttling | Sustained delta > 0 over 5 minutes |
RejectedInserts rate | Hard insert failures have begun | Any non-zero sustained rate |
| Active parts per partition | Distance to the delay and throw thresholds | > 500 parts (50% of default delay threshold) |
Merge activity (system.merges) | Whether consolidation is actually happening | Zero active merges while parts > 500 |
| Insert latency P99 | Client-visible impact of throttling | > 2x baseline sustained for 15 minutes |
| Mutation queue depth | Hidden consumer of merge threads | Any is_done = 0 with parts_to_do flat for > 30 minutes |
| Disk free space | Merges require temporary headroom | < 20% free or unreserved_space trending toward zero |
| Background pool utilization | Thread availability for merge work | Sustained > 90% with part count growing |
Fixes
Merge pipeline is behind after an insert burst. If the burst was temporary, merges may self-recover. Reduce insert rate from the largest writers temporarily. Do not restart the server; restarting forces the server to reload all parts from disk and delays recovery.
Mutations are monopolizing the merge pool. Identify the mutation with the highest parts_to_do and longest runtime. If it is non-critical, kill it:
KILL MUTATION WHERE database = 'db' AND table = 'table' AND mutation_id = '0000000000';
Merges will resume after the mutation thread is released.
Client is sending many small inserts. Fix batching upstream. ClickHouse prefers batches of at least 1,000 rows. Asynchronous inserts buffer small inserts server-side, but they return success before data is written to a part. Monitor server-side part creation, not just client-side acknowledgments.
Disk space is low. Merges stop when ClickHouse estimates there is insufficient free space to write the merged result. Free space must be at least large enough to accommodate the biggest pending merge. Detach old partitions or add storage:
-- Identify the largest partition
SELECT database, table, partition_id, formatReadableSize(sum(bytes_on_disk))
FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY sum(bytes_on_disk) DESC LIMIT 5;
Then use ALTER TABLE ... DETACH PARTITION ... to reclaim space if data retention policy allows it. Detached partitions are no longer queryable unless reattached.
Over-partitioning is creating too many partition_ids. If a high-cardinality partition key (for example, partitioning by hour or by a high-cardinality column) is causing part multiplication, the fix requires changing the partition key. This means creating a new table with the correct key, backfilling, and cutting over. It is not an in-place fix.
Prevention
- Monitor active part count at the partition level, not the table level. The limit applies per-partition, so a table-level average hides hotspots.
- Alert on the rate of
DelayedInserts, not the cumulative value. A cumulative counter that is flat is healthy; a rising rate is not. - Keep the ratio of part creation rate to merge completion rate below 1.0 sustained. If you are creating parts faster than you merge them, you are borrowing time.
- Maintain disk usage below 80-85% of capacity. Merges need temporary space roughly equal to the size of the parts being merged.
- Batch inserts client-side to at least 1,000 rows per INSERT. Fewer rows per insert directly increases part count pressure.
- Review partition granularity during schema design. Daily or monthly partitioning is usually sufficient; hourly partitioning requires extremely high merge capacity.
- Track mutation queue depth as part of routine monitoring. Mutations are a common hidden cause of merge starvation.
How Netdata helps
- Correlates
DelayedInsertswith insert latency, active part count, and merge activity on the same charts to show whether the throttle is growing or recovering. - Surfaces per-partition part count trends without manual
system.partsqueries during an incident. - Tracks the rate of change for cumulative counters like
DelayedInsertsandRejectedInsertsautomatically. - Provides background pool utilization metrics alongside disk I/O and memory pressure to distinguish merge backlog from I/O starvation or mutation blocking.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
- ClickHouse monitoring checklist: the signals every production cluster needs
- ClickHouse monitoring maturity model: from survival to expert
- ClickHouse projections and hidden parts: the part count you can’t see
- ClickHouse too many partitions: why over-partitioning multiplies your part count
- ClickHouse DB::Exception: Too many parts - causes and fixes
- How ClickHouse actually works in production: a mental model for operators







