ClickHouse insert latency rising: the leading indicator of write-pipeline trouble

Your ClickHouse inserts are taking longer. A query that committed in 200 ms last week is now taking 5 seconds, then 15, then 30. In most databases this signals slow disks or lock contention. In ClickHouse, sustained insert latency is the earliest operational signal that the write pipeline is congesting. It precedes DelayedInserts, part-count alerts, and the hard stop of RejectedInserts by minutes to hours. Wait for the error and the merge debt is already severe.

ClickHouse measures insert latency in system.query_log as query_duration_ms for query_kind = 'Insert' and type = 'QueryFinish'. For synchronous inserts this captures the full path from statement arrival to part commit. For asynchronous inserts the client-visible number reflects buffering, not the final disk flush, so the trend matters more than the absolute value. In either mode, a sustained rise means backpressure is building somewhere between the TCP handler and the frozen part on disk.

Congestion usually starts in the merge subsystem. Each INSERT creates immutable parts. Background merges combine smaller parts into larger ones to keep query performance stable and file descriptor usage bounded. When parts accumulate faster than merges consolidate them, ClickHouse throttles inserts with exponential backoff. Rising latency is the first observable symptom. After latency climbs, DelayedInserts appears. Then RejectedInserts. Then writes halt entirely.

flowchart TD
    A[High insert rate] -->|creates| B[Growing active parts]
    B -->|exceeds| C[Merge throughput]
    C -->|produces| D[Rising insert latency]
    D -->|precedes| E[DelayedInserts]
    E -->|escalates to| F[RejectedInserts]
    F -->|blocks| G[Write halt]
    B -->|consumes| H[Disk space]
    H -->|stalls| C

What this means

Rising insert latency means the pipeline cannot absorb the current write load at its usual speed. The bottleneck is rarely the insert itself. More often, the background merge pool, disk I/O, or coordination service cannot keep up with the part creation rate, and the system starts queuing or throttling writes.

Because ClickHouse delays inserts using an exponential formula once the active part count in a partition exceeds parts_to_delay_insert , latency growth is non-linear. As the part count approaches parts_to_throw_insert , delays grow super-exponentially. By the time you see multi-second insert times, the merge subsystem is already under water. Sustained latency above 30 seconds means the write pipeline is effectively blocking.

This signal is workload-facing. Unlike internal merge counters, insert latency is what your application feels. It is the canary for the entire part lifecycle.

Common causes

CauseWhat it looks likeFirst thing to check
Merge backlogActive part count growing steadily; merges running but not fast enough to reduce partssystem.parts active count per partition versus system.merges progress
Disk I/O saturationDisk latency elevated; merge throughput low despite active threads; query latency also climbingDisk await and system.merges bytes/sec
Mutation backlog blocking mergessystem.mutations shows is_done = 0 and merge pool threads are consumed by mutationssystem.mutations and background pool utilization in system.metrics
Replication or ZooKeeper overheadLatency spikes on replicated tables; ZooKeeper session metrics show stressReplica queue_size and ZooKeeper session state
Small insert anti-patternMany insert queries with few rows each; part count rises faster than row countRows per insert from system.query_log
Async insert bufferingClient latency reflects buffer time, not flush; actual land time diverges from ack timesystem.query_log for insert latency trend versus part creation timing

Quick checks

Run these safe, read-only checks to confirm the trend and locate the bottleneck.

# Check insert latency trend
SELECT
    toStartOfFiveMinutes(event_time) AS window,
    quantile(0.99)(query_duration_ms / 1000) AS p99_sec,
    avg(query_duration_ms / 1000) AS avg_sec,
    count() AS inserts
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND event_time > now() - INTERVAL 1 HOUR
GROUP BY window
ORDER BY window DESC;
# Check active parts per partition
SELECT
    database,
    table,
    partition_id,
    count() AS parts
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts DESC
LIMIT 20;
# Check running merges and mutations
SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts,
    is_mutation,
    formatReadableSize(total_size_bytes_compressed) AS size
FROM system.merges
ORDER BY elapsed DESC;
# Check insert throttling events
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
# Check mutation backlog
SELECT
    database,
    table,
    mutation_id,
    parts_to_do,
    latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
# Check background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';
# Check disk space
SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    round(100 * (1 - free_space / total_space), 1) AS used_pct
FROM system.disks;
# Check replication queue depth if using replicated tables
SELECT
    database,
    table,
    queue_size,
    log_max_index - log_pointer AS entries_behind
FROM system.replicas
WHERE queue_size > 0
ORDER BY queue_size DESC;

How to diagnose it

  1. Confirm the deviation. Use system.query_log to verify that P99 insert latency is sustained above 5x your baseline or above 30 seconds. One slow query is noise; a 20-minute upward trend is signal.

  2. Find part hotspots. Query system.parts grouped by partition_id. If any partition has more than 500 active parts, or is approaching parts_to_delay_insert, merges are falling behind.

  3. Inspect merge progress. Check system.merges. If progress is stalled across multiple merges, or elapsed is growing without forward movement, merges are stuck. Note whether entries are is_mutation = 1, which means mutations are consuming merge slots.

  4. Look for active throttling. Query system.events for DelayedInserts and RejectedInserts. Sustained growth in DelayedInserts confirms ClickHouse is throttling writes. Non-zero RejectedInserts means the hard limit is reached.

  5. Rule out disk space. Check system.disks. If free space is below 20%, merges may halt because they cannot write temporary output before deleting sources. This is a common cause of merge starvation that looks like a merge performance problem.

  6. Check for blocking mutations. If system.mutations shows pending work and the background pool is fully occupied by mutations, merges are starved. Mutations rewrite entire parts and can monopolize threads for hours.

  7. Validate insert batching. Compute rows per insert from system.query_log. If your application is sending many inserts with fewer than 1,000 rows each, it is creating excessive parts. This is the most common root cause of merge backlog.

  8. Correlate with resource metrics. Check disk write throughput and background pool utilization. If the pool is fully utilized and parts are still growing, the merge subsystem lacks CPU or I/O headroom to keep up.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Insert latency P99Earliest workload-facing indicator of write pipeline congestionSustained > 5x baseline or > 30 seconds
Active parts per partitionDistance to insert throttling and rejection> 500 or steady growth over 30 minutes
DelayedInsertsClickHouse is actively throttling writesCounter increasing over time
RejectedInsertsHard insert failures due to part limitsAny non-zero sustained value
Merge activityWhether background merges are keeping upZero merges running while parts are high and inserts are active
Mutation queue depthMutations block merges and consume pool threadsis_done = 0 with flat or non-decreasing parts_to_do
Disk free spaceMerges need temporary space to write output< 20% or unreserved_space near zero
Background pool utilizationCapacity to run merges, fetches, and mutationsFully utilized with growing part count
Insert batch sizeSmall batches create parts faster than merges can consolidate< 1,000 rows per insert sustained

Fixes

Merge backlog and too many parts

Throttle or pause upstream inserts to reduce the part creation rate. Increase client batch sizes so each INSERT creates fewer parts. If mutations are monopolizing the pool, kill non-critical mutations with KILL MUTATION. You can increase merge concurrency only if CPU and disk I/O have clear headroom; the default background_merges_mutations_concurrency_ratio is 2, and raising it on an already saturated disk will make latency worse.

Disk I/O saturation

Temporarily reduce heavy analytical query load to free I/O bandwidth for merges. Check whether large queries are spilling to disk and competing with merge writes. Ensure the ClickHouse data directory is not sharing physical disks with ZooKeeper transaction logs or system logs.

Replication and ZooKeeper overhead

Check ZooKeeper session metrics for expired sessions or elevated latency. If ZooKeeper is slow, do not restart ClickHouse nodes; reconnection storms amplify coordination load. Pause non-critical DDL operations to reduce metadata pressure on the coordination service.

Small inserts

Fix client logic to batch rows. Aim for at least 1,000 rows per INSERT. If you cannot control the client, server-side async insert buffering changes the latency profile, but you must then monitor actual part creation and flush timing rather than client acknowledgment time.

Mutations blocking merges

Identify long-running mutations in system.mutations. Kill mutations that are not business-critical. Plan heavy ALTER UPDATE or ALTER DELETE operations during low-traffic windows, and avoid treating ClickHouse like an OLTP database for frequent small mutations.

Low disk space

Drop old partitions to reclaim space. Add disk capacity; merges require temporary headroom to complete. Review TTL policies, but TTL enforcement depends on merges running. If merges are stalled, expired data persists.

Prevention

Monitor active part count at the partition level, not just the table level. The parts_to_delay_insert and parts_to_throw_insert limits apply per partition, and a single hot partition can throttle an entire table. Track the ratio of part creation rate to merge completion rate; if it stays above 1.0 for more than a brief burst, you are accumulating debt. Keep disk usage below 80% as an operational target, because merges need free space to write temporary output. Monitor system.mutations proactively; a forgotten mutation is a common silent cause of merge starvation. Enforce minimum insert batch sizes in your ingestion pipeline. Alert on insert latency percentiles from system.query_log as a primary signal. It fires before part-count alerts, giving you minutes to hours of runway.

How Netdata helps

  • Netdata collects insert latency percentiles from system.query_log and correlates them with merge activity, active part count, and disk I/O in real time.
  • Built-in alerts fire when DelayedInserts or RejectedInserts increase, and when active part count crosses safe thresholds per partition.
  • Disk I/O latency and background pool utilization charts help distinguish merge backlog from storage saturation without ad hoc queries.
  • For replicated clusters, ZooKeeper connection health and replication queue depth are automatically correlated with insert latency trends.
  • Per-partition part count visualization surfaces hotspots that table-level aggregates hide.