ClickHouse too many partitions: why over-partitioning multiplies your part count

You are watching MaxPartCountForPartition climb, or you have already seen DB::Exception: Too many parts. You check the table and see a modest total of active parts. The table looks healthy, but ClickHouse rejects inserts anyway.

The problem is not the table total. It is the partition boundary.

ClickHouse enforces part limits per partition, not per table. A high-cardinality PARTITION BY key like toYYYYMMDD(timestamp) or a business identifier creates thousands of isolated part pools. Parts never merge across those pools. One hundred partitions with thirty parts each produces 3,000 parts, multiplied merge-scheduling overhead, and higher file-descriptor pressure. One partition crossing the hard limit kills inserts for that partition while the rest of the table appears idle.

What it is and why it matters

In ClickHouse, a partition is a logical slice defined by PARTITION BY. A part is a physical directory on disk created by every INSERT. Background merges consolidate smaller parts into larger ones only within the same partition.

Because limits are enforced per partition, aggregate metrics mislead. A table with 100 partitions and 10 active parts each totals 1,000 parts, yet every partition looks safe. The system still pays the cost: more file descriptors, heavier merge scheduling, slower queries, and more metadata. If one partition hits the hard limit, inserts into that partition fail while the rest of the table is idle.

Partitioning is a data-management tool, not a query-performance optimizer. Its primary purposes are TTL expiration and partition-level operations like ALTER TABLE ... DROP PARTITION. It does not speed up queries the way a primary-key index does. Cross-partition queries open more files and can be slower than equivalent queries against a coarsely partitioned table.

How it works

flowchart TD
    A[High-cardinality PARTITION BY] --> B[1000s of partitions]
    B --> C[Parts never merge across partitions]
    C --> D[100 partitions x 30 parts each]
    D --> E[3000 total parts, FD pressure, merge overhead]
    E --> F[One partition hits limit and rejects inserts]

Every INSERT evaluates PARTITION BY and writes rows into a new part inside the matching partition. A background merge thread later picks parts within that partition and rewrites them into a larger part. The original parts are marked obsolete and dropped.

Merges cannot cross partition boundaries, so the partition key acts as a hard firewall for consolidation. If you partition by day and insert small batches every few minutes, each day accumulates dozens or hundreds of tiny parts. A table with 1,000 partitions and 10 parts each creates 10,000 discrete units of merge overhead, multiplying file-descriptor usage and scheduling pressure.

The per-partition enforcement works in two stages. When a partition’s active part count crosses parts_to_delay_insert, ClickHouse slows new inserts to that partition. If growth continues and the count crosses parts_to_throw_insert, the server throws DB::Exception: Too many parts and rejects INSERT.

Verify your effective thresholds in system.merge_tree_settings:

SELECT name, value FROM system.merge_tree_settings
WHERE name IN ('parts_to_delay_insert', 'parts_to_throw_insert');

There is also max_partitions_per_insert_block, an insert-time gate that rejects an INSERT block if it touches too many distinct partition values. It protects against batch inserts that spray rows across hundreds of partitions. Verify it in system.settings:

SELECT name, value FROM system.settings
WHERE name = 'max_partitions_per_insert_block';

Where it shows up in production

Daily partitioning on long-retention tables. PARTITION BY toYYYYMMDD(event_time) creates one partition per day. Three years of data yields over 1,000 partitions, each managing its own part lifecycle. Frequent small inserts and late-arriving data let older partitions accumulate parts and hit limits long after you assumed they were stable.

High-cardinality business keys. Partitioning by tenant_id, user_id, or country_code creates unbounded partition growth. Every new tenant adds an isolated part pool. Parts never merge across tenants, so the part count per tenant grows indefinitely unless merges keep pace with every tenant’s insert rate.

Hourly or finer time slicing. Hourly partitioning multiplies partition count by 24x compared to daily, and by 720x or more compared to monthly. Each hour becomes an independent part universe. Data from 09:00 never merges with 10:00, even if both partitions are tiny.

Kafka Engine ingestion. Message batches often span many time buckets. A single INSERT from a Kafka engine can touch dozens or hundreds of partitions, quickly exhausting per-batch limits and scattering small parts across the table. This commonly triggers max_partitions_per_insert_block rejections.

Tradeoffs and when to use it

Partitioning is for data management. It enables TTL enforcement and bulk deletion via DROP PARTITION. It is not a substitute for sorting keys or primary-key indexing.

Rules for choosing a partition key:

  • Prefer monthly granularity. toYYYYMM(timestamp) is the default recommendation for time-series tables. It keeps partition counts low while still allowing monthly drops and TTL cleanup.
  • Keep cardinality bounded. Aim for fewer than a few hundred distinct partition values. If your natural key produces thousands, it is too fine.
  • Never partition by high-cardinality identifiers. Keys like user_id or precise timestamps generate one partition per value. Because parts never merge across partitions, this guarantees unbounded part accumulation.
  • Match batch boundaries to partition boundaries. If your ingest pipeline produces hourly files, hourly partitioning only works if batch sizes and merge throughput keep each hour’s part count low indefinitely. Most production workloads are safer with daily or monthly granularity.

If you already have too many partitions, changing PARTITION BY requires rewriting the table. There is no in-place ALTER TABLE ... MODIFY PARTITION BY.

Signals to watch in production

SignalWhy it mattersWarning sign
MaxPartCountForPartitionHighest active part count across all partitionsSustained growth above 50% of parts_to_throw_insert
Active parts per partitionThe limit is enforced here, not at the table levelAny partition consistently above 100 active parts
max_partitions_per_insert_block rejectionsInsert-time guard against cross-partition sprayingDB::Exception: Too many partitions for single INSERT block
Total partition countDirect multiplier for metadata and merge overheadTable approaching thousands of partitions
DelayedInserts / RejectedInsertsEarly and late warnings that a partition is saturatedSustained increase in DelayedInserts

Read the global peak directly:

SELECT value FROM system.metrics WHERE metric = 'MaxPartCountForPartition';

Inspect per-partition part counts:

SELECT
    database,
    table,
    partition_id,
    count() AS active_parts,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 20;

Check whether merges are running per partition:

SELECT
    database,
    table,
    partition_id,
    count() AS merging_parts
FROM system.merges
GROUP BY database, table, partition_id
ORDER BY merging_parts DESC
LIMIT 20;

Watch the delay/throw counters:

SELECT event, value FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');

How Netdata helps

Netdata correlates the signals that predict partition-level part explosions before inserts fail.

  • Tracks MaxPartCountForPartition alongside insert throughput and merge activity so you can see when merges are losing the race.
  • Surfaces per-partition part counts to identify specific partition-key hotspots.
  • Monitors DelayedInserts and RejectedInserts event counters from system.events, warning when a partition approaches its throttle or hard limit.
  • Correlates part-count growth with background pool utilization and disk I/O to distinguish over-partitioning from merge starvation.