ClickHouse checksum mismatch and broken parts: detecting data corruption

ClickHouse logs showing Checksum doesn't match, Broken part, or similar errors indicate data corruption. Affected parts move to system.detached_parts. Queries may throw exceptions or return partial results. On replicated clusters, a replica with corrupt parts may lag because it cannot validate fetched parts. Corruption does not self-resolve. You must quarantine the bad part, identify the root cause, and rebuild the data from a healthy source or backup.

ClickHouse stores every data part with checksum files covering column data, index files, and metadata. When the server reads a part during a query, merge, or replication fetch, it recomputes checksums and compares them against the stored values. A mismatch means the bytes on disk no longer match what was written. The server detaches the part, moving it out of the active dataset. If the table is replicated, the replica may attempt to re-fetch the part from a peer. If no healthy copy exists, the data is effectively lost. On standalone nodes, the only recovery path is external restore or re-ingestion.

flowchart TD
    A[Checksum mismatch detected] --> B{Replicated table?}
    B -->|Yes| C[Check peer replicas
for healthy copy] B -->|No| D[Check hardware
and filesystem] C --> E{Healthy replica
has part?} E -->|Yes| F[Remove corrupt part
Trigger re-fetch] E -->|No| G[Data loss confirmed
Restore from backup] D --> H[Run CHECK TABLE] H --> I{Hardware or
filesystem errors?} I -->|Yes| J[Replace failing component] I -->|No| K[Restore or re-ingest
the dataset]

Common causes

CauseWhat it looks likeFirst thing to check
Failing disk hardwareChecksum errors clustered on one volume; kernel logs show storage errorsdmesg for disk or controller errors
Filesystem corruptionErrors after unclean shutdown or on a specific mount point; multiple unrelated parts affectedOS logs for filesystem metadata errors
Bad RAMRandom corruption across unrelated tables and partitions; no disk errorsdmesg for memory ECC errors
Incomplete fetch during replicationsystem.replication_queue shows last_exception referencing fetch or checksum failure on the receiving replicasystem.replication_queue on the target replica
Software bug in merge or mutationCorruption appears immediately after a merge or mutation completes; affected part name matches the operationsystem.part_log for recent MergeParts or MutatePart events on the affected part

Quick checks

Run these read-only checks before any destructive action.

# Check server logs for corruption indicators
grep -Ei 'checksum|corrupt|Broken part|Cannot read all data|Mismatch' /var/log/clickhouse-server/*.log | tail -100
-- Inspect detached parts and the reasons they were removed
SELECT database, table, name, reason, bytes_on_disk, modification_time
FROM system.detached_parts
ORDER BY modification_time DESC;
-- Check replication queue for stuck fetch or checksum failures
SELECT database, table, type, num_tries, last_exception
FROM system.replication_queue
WHERE num_tries > 0
ORDER BY num_tries DESC;
-- Check for integrity-related event counters
SELECT event, value
FROM system.events
WHERE event IN ('ReplicatedPartChecksFailed', 'ReplicatedDataLoss');
-- Assess blast radius: active parts in the affected table
SELECT partition_id, count() AS active_parts, formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE database = 'your_db' AND table = 'your_table' AND active = 1
GROUP BY partition_id
ORDER BY active_parts DESC;
-- Proactively verify a table that has not yet shown errors
CHECK TABLE your_db.your_table;

How to diagnose it

  1. Confirm the error pattern. Search the server error log for Checksum, corrupt, Broken part, Cannot read all data, or Mismatch. Note the timestamp, table, and part name. If the error repeats on the same part, suspect a localized disk or fetch issue. Random parts across tables suggest memory or filesystem corruption.

  2. Map detached parts to tables. Query system.detached_parts and examine the reason column. Reasons such as broken-on-start, checksum-mismatch, or fetch-related errors confirm automatic detachment due to integrity failure. Note modification_time to see if the detachment correlates with a recent restart, merge, or replication event.

  3. Determine if the table is replicated. For ReplicatedMergeTree tables, check system.replicas. If the replica is healthy, a peer may still hold a valid copy. If the table is not replicated, treat the incident as potential data loss.

  4. Inspect the replication queue. Query system.replication_queue for entries with high num_tries and non-empty last_exception. If the exception references a checksum mismatch or failed fetch, corruption is blocking convergence. Check the exception message for the source replica and verify that peer’s health before forcing a re-fetch.

  5. Check hardware and OS logs. Review dmesg for disk controller errors, I/O failures, or memory ECC errors around the time the corruption first appeared. If hardware errors are present, stop using the node for production data until the component is replaced.

  6. Correlate with background operations. Query system.part_log for MergeParts or MutatePart events involving the affected part. Note event_time and duration_ms. An extremely long merge that aborted, or a mutation with a large duration and no subsequent successful operation on the same partition, is suspicious. While rare, software bugs in specific ClickHouse versions can produce invalid output during large merges or mutations.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
system.detached_parts countDirect indicator of parts removed due to integrity failuresUnexpected growth in count or bytes
ReplicatedPartChecksFailedIntegrity failures detected during replication verificationSustained non-zero rate
ReplicatedDataLossConfirmed unrecoverable data loss on a replicated tableAny non-zero value
system.replication_queue.last_exceptionShows whether replication is stuck due to corrupt source partsEntries with num_tries increasing and checksum-related exceptions
Checksum errors in server logsEarliest detectable sign of hardware or filesystem degradationNew log lines matching Checksum or Mismatch
CHECK TABLE resultProactive integrity scan that surfaces mismatches before they hit queriesResult shows non-empty list of corrupted parts

Fixes

Replicated tables: force a re-fetch

If another replica has a healthy copy, remove the corrupted part from detached/ and let the replication queue fetch a replacement. First, fix the root cause (failing disk, bad RAM, filesystem damage) to prevent re-corruption. Query system.replicas to confirm a healthy peer holds the part.

If the part is already in system.detached_parts, it is inactive. Remove the directory from the table’s detached/ path. The replica should schedule a fetch automatically. If the queue stalls, SYSTEM RESTART REPLICA db.table may unblock replication.

For a replica with widespread corruption or metadata inconsistency, rebuilding the replica is often faster than incremental repair. Drop and re-create the table, or use SYSTEM DROP REPLICA if the local replica metadata is out of sync with ZooKeeper, to force a full re-sync. This destroys all local data for that table and generates significant network load.

Standalone tables: restore or re-ingest

For non-replicated tables, there is no peer to heal from. Restore the detached part from backup or re-ingest from upstream. If the part is outside your retention window, you may drop it and accept the gap.

Warning: Dropping a detached part is destructive and irreversible. Before dropping anything, confirm the scope by comparing row counts or partition-level aggregates against your upstream source of truth.

Hardware and filesystem faults: fix the substrate first

Corruption caused by failing SSDs, bad RAM, or filesystem damage will recur until the underlying layer is repaired. Replace failing hardware. If corruption appeared after an unclean shutdown, run a filesystem consistency check on the ClickHouse data volume before returning the node to production. Do not run CHECK TABLE or restore data onto a known-bad disk.

Clean up detached parts

Detached parts remain under the table’s data path in detached/ and consume disk. Once the active dataset is verified healthy and you no longer need the files for forensics, remove them to reclaim space.

Prevention

  • Schedule CHECK TABLE on large or critical tables during low-traffic windows. It verifies checksums without blocking reads or writes.
  • Alert on growth in system.detached_parts count or size. Detachment is almost always a reaction to an integrity failure.
  • Correlate ClickHouse corruption signals with OS-level disk SMART alerts, dmesg errors, and memory ECC logs.
  • Replicate critical data. Standalone nodes have no self-healing path for corrupted parts.
  • Inspect replication queues. A replica with stuck fetch entries and rising num_tries may be repeatedly downloading a corrupted source part. Catch this before it cascades.
  • Patch known merge and mutation bugs promptly. Corruption defects in ClickHouse are typically fixed quickly once identified.

How Netdata helps

  • Alert on server log lines matching Checksum, Broken part, Mismatch, and Cannot read all data.
  • Correlate ReplicatedPartChecksFailed and ReplicatedDataLoss counter spikes with disk I/O error metrics and memory ECC events to distinguish hardware faults from replication bugs.
  • Chart the count and bytes of system.detached_parts over time to spot gradual integrity degradation that logs alone might miss.
  • Monitor replication queue depth and exception rates alongside checksum signals to identify whether corruption is blocking convergence.
  • Track part-count stability and CHECK TABLE execution so deviations from normal disk-structure health are visible before queries fail.