$ guides / clickhouse / clickhouse-unauthorized-ddl-drop-table ▌

Operations Guides

ClickHouse unauthorized DROP TABLE: auditing DDL and privilege anomalies

A production table disappears. An application returns “table does not exist.” In ClickHouse, DROP TABLE removes the table definition and MergeTree data parts from disk immediately. On replicated tables, a single ON CLUSTER command propagates through the coordination service and can erase the table everywhere before you intervene. There is no native undo.

This guide is for finding out what happened, determining blast radius, and closing the gaps.

What this means

DROP TABLE is not a slow failure. It removes the table definition and data parts from disk immediately. ReplicatedMergeTree propagates the drop through the coordination service, so a single command can replicate across the cluster before you can intervene. There is no DDL rollback log.

The most common cause is an over-privileged default user. In many legacy deployments, the default account has no password and broad grants. Applications, analysts, or automation that connect as default can execute destructive DDL. Cluster-wide ON CLUSTER commands amplify the risk because they execute on every reachable node.

Common causes

Cause	What it looks like	First thing to check
Over-privileged `default` user	DROP executed by user `default` from an unexpected host	`system.query_log` filtered by `user = 'default'` and `query_kind = 'Drop'`
Compromised service account	DROP issued from a client host outside the known fleet	`client_hostname` and query timestamp in `system.query_log`
Human error via client tool	Analyst connected to production instead of staging	`user` and `client_hostname` at the exact event time
Application or migration bug	Unintended DROP during a deployment	DDL queries from deployment hosts just before the incident, or failed migration queries in `system.query_log`
Malicious insider or lateral movement	Large SELECT exports followed by DROP from the same session	Correlate high `read_bytes` with DDL in `system.query_log`

Quick checks

Run these immediately. They are read-only.

-- Destructive DDL in the last hour by user and source host
SELECT
    event_time,
    user,
    client_hostname,
    query_kind,
    substring(query, 1, 200) AS query_prefix
FROM system.query_log
WHERE query_kind IN ('Alter', 'Create', 'Drop', 'Truncate')
  AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC;

-- Cluster DDL that did not finish on all replicas
SELECT
    entry,
    query,
    status,
    exception_text,
    query_create_time
FROM system.distributed_ddl_queue
WHERE status != 'Finished'
ORDER BY query_create_time DESC;

-- Privilege exceptions that may indicate probing
SELECT
    event_time,
    user,
    query,
    exception
FROM system.query_log
WHERE exception LIKE '%ACCESS_DENIED%'
  AND event_time > now() - INTERVAL 1 HOUR;

-- Failed login attempts
SELECT
    event_time,
    user,
    client_address,
    auth_type,
    failure_reason,
    type
FROM system.session_log
WHERE type = 'LoginFailure'
  AND event_time > now() - INTERVAL 1 HOUR;

# Verify network exposure
ss -tlnp | grep clickhouse

How to diagnose it

Confirm the event. Query system.query_log for query_kind = 'Drop' around the incident window. Verify whether the query finished (type = 'QueryFinish'). A successful DROP with no exception means the data is gone.
Determine if it was cluster-wide. If the query included ON CLUSTER, inspect system.distributed_ddl_queue. Entries with status != 'Finished' mean some replicas did not execute the drop, leaving schema drift. If all hosts show Finished, the drop propagated everywhere.
Identify the actor. Capture user, client_hostname, and the exact query text from system.query_log. If the user is default, the over-privileged default account is the likely root cause.
Check for reconnaissance. In the minutes before the DROP, look for large SELECT queries from the same user or host with abnormally high read_bytes or result_bytes. Extraction before destruction is a common attack pattern.
Audit privilege grants. Review which accounts hold DDL privileges. Flag service accounts with broader grants than their workload requires.
Inspect replication health. If the table was replicated and the DROP was not uniform, compare system.replicas across nodes. A replica that missed the DDL may still hold the table data, while others do not.

flowchart TD
    A[Table missing or schema changed] --> B{Query system.query_log}
    B -->|Drop found| C[Record user, host, time]
    B -->|No local entry| D{Check system.distributed_ddl_queue}
    D -->|ON CLUSTER drop| E[Check per-replica status]
    D -->|No entry| F[Check system.session_log]
    C --> G[Audit grants for that user]
    E --> H[Schema drift or full propagation]
    G --> I[Revoke excess privileges]
    F --> J[Investigate auth anomalies]
    H --> K[Restore or resync replica]

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`system.query_log` DDL rate	Early signal of unauthorized schema changes	Any `Drop` or `Truncate` from non-admin users
`system.distributed_ddl_queue` status	Reveals silent schema drift after cluster DDL	`status != 'Finished'` sustained for more than 5 minutes
`system.session_log` login failures	Indicates brute force or credential misuse	More than 10 failures per minute from a single source
`system.query_log` read_bytes per user	Flags data exfiltration that may precede destruction	Unexpected user scanning large volumes followed by DDL

Fixes

Immediate containment

Revoke DDL privileges from the affected account immediately. If the session is active, terminate any running queries and rotate the password.
Rotate credentials for any service account that issued the DROP.
If the event came from an unexpected host, verify ClickHouse is bound to internal interfaces only. ss -tlnp | grep clickhouse should not show public-facing listeners unless explicitly required.

Recovery

Restore from backup. ClickHouse has no native point-in-time recovery for DDL. If you have filesystem backups or object-storage snapshots, recreate the table definition and reload the data.
If the table was ReplicatedMergeTree and some replicas missed the DROP due to a failed distributed DDL, use SYSTEM RESTART REPLICA to force a re-check against ZooKeeper state. If a replica diverged and lost parts, SYSTEM RESTORE REPLICA re-initializes it from other replicas. Be aware that this is destructive to local divergent state.
If no backup exists and one replica still holds the table data, that replica is your recovery source. Do not restart or resync it until you have extracted the data.

Privilege lockdown

Do not run applications or ad-hoc tools as the default user. Create named accounts with restricted grants.
Restrict the default user to localhost-only connections, or disable it entirely if your deployment allows.
Apply role-based access control. Limit ALTER, DROP, CREATE, and TRUNCATE to dedicated administrative accounts.
Bind ClickHouse to internal network interfaces via listen_host to reduce exposure.

Guardrails

Set max_table_size_to_drop to require an explicit configuration override before large production tables can be dropped. This prevents accidental drops from standard clients.
Review distributed_ddl_task_timeout. The default is 180 seconds. If your cluster is large or nodes are under load, short timeouts can leave DDL half-applied, creating silent schema drift.

Prevention

Audit before events. Periodically query system.query_log for DDL from unexpected users or hosts. A five-minute audit run once per shift catches drift early.
Least privilege. Service accounts should have read or append access only. DDL should require a dedicated admin role.
Cluster DDL monitoring. Alert on system.distributed_ddl_queue entries that do not reach Finished. A stuck DDL entry blocks everything queued behind it.
Session forensics. Ensure session_log is enabled in the server configuration. Without it, you cannot reconstruct which client addresses connected or failed to authenticate.

How Netdata helps

Correlate query latency spikes or error jumps with DDL events using system.query_log trends.
Flag anomalous sequences such as a large SELECT followed by DROP from the same session.
Alert on replication health shifts (is_readonly, is_session_expired) after cluster DDL to catch skipped nodes.
Watch system.query_log growth for unusual volume from attack patterns or runaway automation.

The Netdata solution

ClickHouse monitoring with Netdata

Netdata monitors ClickHouse with per-second metrics and ML anomaly detection. Track merge debt, memory usage, replication lag, Keeper/ZooKeeper saturation, and disk headroom against the host signals that drive them.

See ClickHouse monitoring → Start monitoring free

ClickHouse unauthorized DROP TABLE: auditing DDL and privilege anomalies

ClickHouse unauthorized DROP TABLE: auditing DDL and privilege anomalies

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Immediate containment

Recovery

Privilege lockdown

Guardrails

Prevention

How Netdata helps

Related guides

ClickHouse monitoring with Netdata