ClickHouse small inserts anti-pattern: why single-row inserts melt the merge pool
DB::Exception: Too many parts spikes, query latency climbs, and the background merge pool runs flat out while making no progress. Disk, CPU, and memory look healthy. The cause is usually single-row INSERTs from the application.
Every INSERT into a MergeTree table creates at least one immutable data part on disk. ClickHouse is optimized for batch inserts. When clients send single-row or micro-batch inserts, parts are created faster than background merges can consolidate them. This is the number one driver of part accumulation in production clusters.
flowchart LR
A[Single-row INSERT] --> B[New part created]
B --> C[Part count rises]
C --> D[Merge pool falls behind]
D --> E[DelayedInserts]
E --> F[RejectedInserts]
F --> G[Write halt]What this means
Each part is a self-contained directory with compressed column files, indexes, checksums, and metadata. ClickHouse continuously runs background merges that combine smaller parts into larger ones. This is the core storage maintenance mechanism. Fewer parts means fewer file descriptors during queries, fewer index lookups, and lower metadata overhead.
With proper batching, new parts are large and merges keep pace. With tiny inserts, part creation exceeds merge throughput. Each micro-part adds fixed overhead: file descriptors, metadata entries, and merge candidate evaluation. On ReplicatedMergeTree tables, excessive parts also increase replication queue entries and ZooKeeper metadata pressure, which can slow replica synchronization. Merges are I/O-intensive and CPU-intensive. They read all source parts, decompress, sort, recompress, and write the result before deleting sources. The background merge pool has finite threads controlled by background_merges_mutations_concurrency_ratio. As the part count climbs, merges take longer because there are more candidate combinations to evaluate and more files to process.
When the per-partition part count crosses parts_to_delay_insert, ClickHouse artificially delays INSERT operations to slow creation. If the pattern continues, the count crosses parts_to_throw_insert and inserts are rejected entirely. Recovery requires stopping the bad insert pattern and waiting for merges to drain the backlog, which can take minutes to hours depending on how deep the debt is.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| ORM or application emitting per-row INSERTs | InsertQuery rate is high but InsertedRows is nearly identical to query count | system.query_log for average rows per insert |
| ETL pipeline flushing micro-batches too frequently | Rows per insert in the 10-100 range on a steady clock | Query log grouped by time window |
| Async insert fire-and-forget | Client sees low latency but server-side parts still explode | Whether async inserts are configured to wait for flush |
| Materialized views on a high-frequency insert target | One logical insert creates parts across multiple tables | written_rows in system.query_log versus the target table |
Quick checks
Run these read-only checks to confirm the anti-pattern and assess merge pool health. Run them during low traffic if possible; querying system.query_log on an already stressed system adds marginal load.
-- Check average insert batch size over the last hour
SELECT
count() AS insert_count,
sum(written_rows) AS total_rows,
round(avg(written_rows)) AS avg_rows_per_insert
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_kind = 'Insert'
AND event_time > now() - INTERVAL 1 HOUR;
Healthy batching averages 1,000 or more rows per insert. Sustained averages below 100 indicate the small-insert anti-pattern.
-- Find partition hotspots
SELECT
database,
table,
partition_id,
count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 20;
-- Check if inserts are already being delayed or rejected
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- See if merges are actually running
SELECT
database,
table,
elapsed,
progress,
num_parts
FROM system.merges
ORDER BY elapsed DESC;
-- Check background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';
How to diagnose it
Quantify insert granularity. Query
system.query_logto compute rows per insert over a representative window. If the average is below 1,000 and the insert rate is high, you have a batching failure. Consistently falling below that threshold elevates part accumulation risk. Group byclient_hostnameoruserif multiple services write to the same cluster.Map part hotspots per partition. Query
system.partsgrouped bypartition_id. High counts concentrated in specific partitions confirm exactly where merges are losing ground. Remember thatparts_to_delay_insertandparts_to_throw_insertapply per partition, not per table, so a single hot partition can throttle an entire table.Correlate merge throughput with part creation. If
system.mergesshows active merges butsystem.partscounts are still climbing, part creation is exceeding consolidation. Check whether the background pool is fully utilized viasystem.metrics. If utilization is pinned and part counts rise, the pool is starved.Check for mutations compounding the problem. Run
SELECT * FROM system.mutations WHERE is_done = 0. Mutations rewrite entire parts and compete with merges for the same pool threads. A mutation backlog starves merges even when insert volume is reasonable.Verify async insert behavior if enabled. If the application uses async inserts, confirm that
wait_for_async_insertis set to1so the client blocks until the server flushes. Immediate acknowledgement returns success as soon as data is buffered, but parts are still created on flush. If the buffer empties frequently with tiny payloads, the anti-pattern persists server-side despite low client latency.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
InsertQuery rate vs InsertedRows | Reveals batch size collapse | Avg rows per insert < 1,000 sustained |
| Active parts per partition | Distance to the “too many parts” cliff | Count > 50% of parts_to_delay_insert |
DelayedInserts | Server is throttling writes to slow accumulation | Counter increasing over time |
RejectedInserts | Hard insert failures are occurring | Any non-zero value |
| Background merge pool utilization | Merge thread starvation | > 90% sustained with growing parts |
| Merge throughput | Whether consolidation keeps up with creation | Merge completion rate below part creation rate |
Fixes
Batch client-side
The only root-cause fix is to stop creating so many parts. Buffer rows in the application and issue one INSERT per 1-2 seconds containing at least 1,000 rows. A simple in-memory accumulator or message consumer batch window directly reduces part creation pressure. For distributed tables, batch before the Distributed table proxy rather than inserting into underlying shards directly; this keeps the batch intact as one insert per shard. Tradeoff: slightly higher end-to-end write latency because data waits in the application buffer before becoming visible.
Enable async inserts with safe settings
If immediate client-side batching is not feasible, set async_insert=1 and wait_for_async_insert=1 so the client receives acknowledgement only after the server flushes. This buffers small inserts server-side and flushes them as larger blocks. Tradeoff: data is not durable when the client receives the ACK if the server crashes before flush. Schema validation is deferred to flush time, so one malformed row can reject an entire buffered batch.
Audit materialized views
If the insert target has materialized views attached, a single INSERT creates parts in every downstream table. The aggregate part creation rate across the table and its views can overwhelm the shared merge pool. Check system.query_log for high written_rows spread across multiple tables from a single query ID. Tradeoff: removing or simplifying views reduces write amplification but may shift computation to query time.
Increase merge pool capacity (temporary relief)
If CPU and disk I/O have headroom, raise background_merges_mutations_concurrency_ratio. This gives merges more threads to drain the backlog faster. Tradeoff: merges consume CPU and I/O that would otherwise serve queries. This is a temporary bridge, not a cure. If the insert pattern does not change, the pool will saturate again at the new limit.
Kill blocking mutations
If system.mutations shows long-running mutations monopolizing the merge pool, kill them with KILL MUTATION. This frees threads to resume consolidation.
Warning: KILL MUTATION discards the mutation. You must reissue it after the merge backlog clears.
Prevention
- Enforce application-level batching discipline. Target 1,000+ rows per INSERT and avoid row-at-a-time patterns from ORMs.
- Monitor rows per insert as a first-class pipeline metric, not just insert latency. Alert when the ratio of
InsertQuerytoInsertedRowsindicates micro-batching. - If using async inserts, set
wait_for_async_insert=1in production and understand server-side flush behavior. - Review partition keys. Even well-batched inserts multiply parts if a single batch scatters across many partition values. A time-based partition key with too fine a granularity, such as hourly instead of daily, can amplify part counts under high-frequency batching.
How Netdata helps
Netdata charts the following without manual query_log analysis:
InsertQueryrate againstInsertedRowsto reveal batch size collapse in real time.- Active part count per table and alerts when it crosses safe ratios of
parts_to_delay_insert. DelayedInsertsandRejectedInsertsso you catch throttling before hard failures begin.- Background pool utilization and merge activity to distinguish small-insert pressure from genuine I/O starvation.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
- ClickHouse monitoring checklist: the signals every production cluster needs
- ClickHouse monitoring maturity model: from survival to expert
- ClickHouse too many partitions: why over-partitioning multiplies your part count
- ClickHouse DB::Exception: Too many parts - causes and fixes
- How ClickHouse actually works in production: a mental model for operators







