$ guides / clickhouse / clickhouse-distributed-ddl-stuck ▌

Operations Guides

ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish

ON CLUSTER DDL is not atomic. The initiator writes a task into a shared queue in ZooKeeper or ClickHouse Keeper under /clickhouse/task_queue/ddl/. Each node pulls entries in order, executes the DDL locally, and reports status back. One slow, down, or misconfigured node can leave the entry unfinished forever, blocking all later DDL on that node and creating schema drift that is invisible until a replica rejects an insert or a subsequent ALTER fails.

The client timeout (distributed_ddl_task_timeout, default 180 seconds) is independent of background execution. Long operations such as MATERIALIZE COLUMN on large tables often time out in the client while merges or mutations continue. Distinguishing a genuine hang from a slow but valid operation is the first step to avoiding harmful intervention.

What this means

system.distributed_ddl_queue exposes the per-host state of the shared queue. A status of Finished means that node completed the task. Any other status, or a missing status from an unreachable node, leaves the entry incomplete. Nodes process the queue sequentially, so one stuck entry blocks everything after it on that node.

flowchart TD
    A[DDL hangs or times out] --> B[Check distributed_ddl_queue]
    B --> C{Status != Finished?}
    C -->|No| D[Check client timeout]
    C -->|Yes| E[Find unfinished host]
    E --> F[Check exception_text]
    F --> G{Schema or ZK issue?}
    G -->|Schema| H[Reconcile across replicas]
    G -->|ZK/Node| I[Fix session or node]
    H --> J[Retry DDL]
    I --> J

Common causes

Cause	What it looks like	First thing to check
Down or unreachable node	One host never reports status; entry age grows while others show `Finished`	`system.distributed_ddl_queue` filtered by `entry` and `host_name`
ZooKeeper or Keeper session loss	`is_session_expired = 1` on the replica; multiple DDL entries stall	`system.replicas` and `system.zookeeper_connection`
Incompatible schema	`exception_text` shows mismatched columns, missing table, or type conflict	`SHOW CREATE TABLE` output diffed across replicas
Long-running DDL exceeding timeout	Status is `Finished` on most hosts but the entry age exceeds 180 seconds	`system.merges` or `system.mutations` on the slow host
Orphaned entry from vanished originator	Very old entry with no active host processing it	`query_create_time` against the current cluster member list

Quick checks

-- List unfinished DDL entries
SELECT entry, query, status, exception_text, query_create_time
FROM system.distributed_ddl_queue
WHERE status != 'Finished'
ORDER BY query_create_time DESC;

-- Map an unfinished entry to specific hosts
SELECT entry, host_name, status, exception_code, exception_text
FROM system.distributed_ddl_queue
WHERE entry = '<stuck_entry_id>';

-- Check coordination service connectivity
SELECT name, host, port, is_expired, session_timeout_ms
FROM system.zookeeper_connection;

# Test external ZooKeeper responsiveness
echo "ruok" | nc <zookeeper-host> 2181

# Test ClickHouse Keeper responsiveness
echo ruok | nc localhost 9181

-- Check replica session and readonly state
SELECT database, table, is_readonly, is_session_expired
FROM system.replicas
WHERE engine LIKE '%Replicated%';

-- Verify the node can still query ZK
SELECT * FROM system.zookeeper WHERE path = '/' LIMIT 1;

# Check node liveness from outside ClickHouse
nc -zv <stuck-host> 9000

How to diagnose it

Query system.distributed_ddl_queue for entries where status != 'Finished'. Sort by query_create_time and focus on the oldest stuck entry.
Filter the queue by that entry ID to see per-host status. Identify which host or hosts are not Finished.
Read exception_text on any host reporting an error. This usually tells you whether the failure is schema-related, a resource issue, or a timeout.
If the exception mentions missing tables or column mismatches, collect SHOW CREATE TABLE db.table from at least two replicas and diff the output.
If there is no exception and the host simply lacks a status, check whether the node is reachable on TCP port 9000 and HTTP port 8123.
On the stuck host, query system.zookeeper_connection. If the session is expired, test ZK or Keeper responsiveness from that host with nc or the system.zookeeper probe.
Check system.replicas on the stuck host for is_readonly = 1 or is_session_expired = 1. A read-only replica cannot execute DDL that requires write access to replicated tables.
If the entry age is high but exception_text is empty and the host appears healthy, verify whether the DDL operation is long-running. Check system.merges or system.mutations on the affected host to see if a background operation is still progressing.
For entries older than hours or days where the originating node is gone, consider whether the entry is orphaned and blocking the queue.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`system.distributed_ddl_queue` status	Direct view of per-node DDL completion	Any row with `status != 'Finished'` sustained > 5 minutes
ZooKeeper or Keeper operation latency	DDL queue state lives in the coordination service; latency spikes stall propagation	`ZooKeeperWaitMicroseconds` trending up, or `is_expired = 1`
Replica session expiry	Expired sessions break replication and DDL coordination	`is_session_expired = 1` on any replica
Replica readonly state	Read-only replicas skip or fail DDL that needs writes	`is_readonly = 1` sustained
Background pool utilization	DDL such as `MATERIALIZE COLUMN` competes for merge threads	Merge pool > 90% while DDL entry age grows
`distributed_ddl_task_timeout`	Short timeouts hide legitimate long operations	Frequent client timeouts with no host error

Fixes

Down or unreachable node

Restore network connectivity or restart the ClickHouse process on the affected host. If the node is permanently decommissioned, remove it from the cluster definition and reconcile any queue entries that expect it.

Warning: Restarting the process interrupts queries on that host and triggers cache warmup.

ZooKeeper or Keeper coordination failure

Stop issuing new DDL immediately to avoid compounding the queue backlog. Check ZK ensemble health with echo mntr | nc <host> 2181 or the Keeper metrics endpoint. If the coordination service is healthy but ClickHouse holds expired sessions, restart ClickHouse on the affected node to force reconnection.

Warning: If ZK itself is saturated, address the storage or network bottleneck first. Restarting ClickHouse nodes during a ZK overload will make the situation worse by generating a reconnection storm.

Incompatible schema on a replica

Compare SHOW CREATE TABLE db.table across replicas. If one host has an extra column, missing index, or different type, align the schema manually on the outlier node using targeted ALTER TABLE statements. Do not reissue the original ON CLUSTER command until all nodes are in a consistent state, or you risk creating another stuck entry.

Warning: Manual schema changes bypass the distributed queue and require careful ordering.

Long-running DDL outliving the timeout

Increase distributed_ddl_task_timeout to match the expected duration of large operations. Verify progress on the slow host by checking system.merges or system.mutations rather than relying on the client timeout. If the operation is legitimate but slow, let it finish.

Warning: Longer timeouts mean the client holds the connection and the queue entry remains visible for an extended period.

Orphaned entries

If an entry from a crashed or removed originator blocks the queue and the operation is clearly abandoned, manual cleanup of the corresponding znode under /clickhouse/task_queue/ddl/ in ZooKeeper may be required.

Warning: This is dangerous. Confirm the entry is not still active on any host, and back up coordination service state before deleting anything. Editing metadata directly can corrupt cluster state if done incorrectly.

Prevention

Poll system.distributed_ddl_queue after every ON CLUSTER deployment. Do not treat DDL as fire-and-forget in automation.
Keep ZooKeeper or Keeper latency below 10 ms and avoid running DDL during maintenance windows that could trigger leader elections.
After restarting a node or recovering from a network partition, verify system.replicas shows no expired sessions before issuing schema changes.
Set distributed_ddl_task_timeout based on the largest expected DDL operation, not the default 180 seconds.
Maintain identical table definitions across replicas. Drift often starts when a previous DDL failed silently on one node and no one checked.

How Netdata helps

Correlate stuck DDL entries with ZooKeeper latency spikes across the same time window to isolate coordination bottlenecks.
Alert on replica session expiry and readonly state, which are leading indicators that DDL will fail.
Track background merge pool saturation that can prolong DDL execution and trigger false timeouts.
Correlate DDL timing with cluster-wide insert latency or replication queue growth to detect compound pressure events.

The Netdata solution

ClickHouse monitoring with Netdata

Netdata monitors ClickHouse with per-second metrics and ML anomaly detection. Track merge debt, memory usage, replication lag, Keeper/ZooKeeper saturation, and disk headroom against the host signals that drive them.

See ClickHouse monitoring → Start monitoring free

ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish

ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Down or unreachable node

ZooKeeper or Keeper coordination failure

Incompatible schema on a replica

Long-running DDL outliving the timeout

Orphaned entries

Prevention

How Netdata helps

Related guides

ClickHouse monitoring with Netdata