ClickHouse authentication failures: system.session_log, brute force, and credential drift
You notice a spike in failed connection attempts to ClickHouse: a security scanner flags repeated TCP 9000 probes, or an application logs connection timeouts after a secrets rotation. ClickHouse exposes authentication events through system.session_log, but only if the feature is enabled. Without it, fallback to server error logs and system.query_log exceptions.
The failures split into two patterns. Malicious: brute-force or credential-scanning campaigns against exposed TCP 9000 or HTTP 8123. Operational drift: a rotated password not updated in a client config, or a deployment shipping an old connection string. Distinguish them fast. Block an external attacker at the network layer; fix credential drift on the client.
This guide shows how to read system.session_log, correlate failures with connection counts and network exposure, and fix the root cause.
What this means
system.session_log records session lifecycle events, including LoginFailure entries with the username, source IP (client_address), authentication type, and failure reason. More than ten LoginFailure events per minute from one IP is a strong brute-force signal. Any failure from a new source IP needs tracing to a known application or user.
If session_log is disabled, the same events may appear in ClickHouse server logs or as AUTHENTICATION_FAILED exceptions in system.query_log. These are harder to aggregate and may lack source-address detail. Regardless of source, failures correlate with network exposure. If listen_host is bound to 0.0.0.0 or a public interface, any host that can reach ports 9000 and 8123 is in the attack surface.
flowchart TD
A[Auth failures detected] --> B{session_log enabled?}
B -->|No| C[Enable session_log or use query_log fallback]
B -->|Yes| D[Aggregate by client_address and user]
D --> E{Single IP > 10/min?}
E -->|Yes| F[Brute force or scanning]
E -->|No| G{Service account failing?}
G -->|Yes| H[Credential drift]
G -->|No| I[Check network exposure and client configs]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Brute force or credential scanning | > 10 LoginFailure events per minute from one IP; many distinct usernames | system.session_log aggregated by client_address |
| Credential rotation drift | Steady failures from a known app server or service account | user and client_address in system.session_log; secrets manager sync status |
| Application misconfiguration | Failures begin after a deployment; usually one user | Deployment timeline and user in system.query_log |
| Overly permissive network binding | External IPs reaching TCP 9000 or HTTP 8123 at all | ss -tlnp output for listen_host |
| Misconfigured monitoring probes | Regular, low-rate failures from internal infra hosts | Source IP of monitoring checkers against known probe config |
Quick checks
Run these read-only checks to characterize the failure pattern without changing any state.
-- Recent authentication failures from session_log
SELECT event_time, user, client_address, auth_type, failure_reason
FROM system.session_log
WHERE type = 'LoginFailure'
AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC;
-- Aggregate failures by source IP to detect brute force
SELECT
client_address,
user,
count(*) AS failures,
max(event_time) AS last_failure
FROM system.session_log
WHERE type = 'LoginFailure'
AND event_time > now() - INTERVAL 10 MINUTE
GROUP BY client_address, user
HAVING failures > 10
ORDER BY failures DESC;
-- Fallback: authentication errors from query_log
SELECT event_time, user, exception, query_id
FROM system.query_log
WHERE exception LIKE '%AUTHENTICATION_FAILED%'
AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC;
# Check network exposure: what interfaces is ClickHouse bound to
ss -tlnp | grep clickhouse
-- Check connection volume for context
SELECT metric, value
FROM system.metrics
WHERE metric IN ('TCPConnection', 'HTTPConnection');
If system.session_log does not exist or returns no rows, the feature is not enabled. Enable it in the server configuration to capture these events.
How to diagnose it
Confirm the event source. Query
system.session_logfortype = 'LoginFailure'. If the table is empty, use thesystem.query_logfallback withexception LIKE '%AUTHENTICATION_FAILED%'. Note thatquery_loglacks the precise source-address detail ofsession_log.Identify the failure pattern. Aggregate by
client_addressanduser. A single IP with more than ten failures per minute suggests brute force or automated scanning. A single service account failing from a known application host suggests credential drift.Correlate with changes. Check whether the onset of failures aligns with a recent deployment, secrets rotation, or infrastructure change. Credential drift almost always starts within minutes of a password or key rotation.
Audit network exposure. Run
ss -tlnp | grep clickhouseand inspect the bound addresses. If ClickHouse is listening on0.0.0.0or a public interface and you see brute-force attempts from external IPs, the immediate priority is reducing that exposure.Review server error logs. Check the ClickHouse server log file for connection failure details. On standard Linux installations this is
/var/log/clickhouse-server/clickhouse-server.log. Look for unknown user, wrong password, or protocol mismatch messages.Map internal failures to consumers. For operational drift, filter
session_logby the failinguserand mapclient_addressto known applications or hosts. Verify the connection strings and credentials in the corresponding secrets manager or configuration store.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
system.session_log LoginFailure rate | Captures every failed authentication event with source IP and reason | > 10 failures per minute from one IP |
system.query_log AUTHENTICATION_FAILED | Fallback when session_log is disabled; tracks exceptions across all queries | Sustained failures from service accounts |
| Client connection count | Distinguishes brute force from connection leaks or retry storms | Spike in TCPConnection matching auth failure times |
| Network interface binding | Unnecessary exposure invites scanning and widens the blast radius | Listening on 0.0.0.0 or public interfaces |
Fixes
Brute force or credential scanning
Block the source IP at your firewall, cloud security group, or reverse proxy. Do not rely on ClickHouse for rate limiting. If ClickHouse is directly exposed because of a permissive listen_host, restrict it to internal interfaces or specific addresses. If the source is an internal misconfigured health check, fix the checker instead of blocking the IP.
Credential drift after rotation
Identify the failing user from system.session_log.user. Update the password or key in the application’s connection string, environment variable, or secrets manager. Restart or reload the client to clear cached credentials. Verify by watching LoginFailure entries for that user stop. If old and new credentials overlap during rotation, revoke the old credential to prevent silent fallback.
Application misconfiguration
Correlate the start of failures with a deployment timestamp. Roll back if ongoing, or patch the configuration. Use client_address from system.session_log or user and event_time in system.query_log to identify the emitting host if session detail is insufficient.
Missing session_log coverage
If system.session_log is disabled, failed authentication events are invisible to native SQL audit. Enable it in the ClickHouse server configuration. Until then, use system.query_log and server error logs as fallbacks.
Prevention
- Enable and retain
system.session_log. - Bind ClickHouse to specific internal interfaces via
listen_host; audit withss -tlnpafter any configuration change. - Store credentials in a secrets manager and automate rotation with application restarts or hot-reload.
- Monitor for
LoginFailurespikes as an infrastructure security signal, not just a database issue. - Run periodic audits of active users and their expected source IP ranges.
How Netdata helps
Netdata collects ClickHouse TCPConnection and HTTPConnection metrics and query error rates. Correlate connection spikes with error-rate jumps to distinguish brute-force scans from client misconfiguration. Set alerts on unusual connection counts or error rates to catch authentication issues without manual polling.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse background pool saturation: when merges and mutations starve
- ClickHouse mark cache and uncompressed cache: reading low hit rates
- ClickHouse client connections climbing: TCP 9000, HTTP 8123, and connection leaks
- ClickHouse checksum mismatch and broken parts: detecting data corruption
- ClickHouse delayed inserts: the warning before too-many-parts
- ClickHouse detached parts piling up: reading system.detached_parts and reclaiming space
- ClickHouse disk space collapse: why merges need free space and how the spiral starts
- ClickHouse disk space monitoring: free_space, unreserved_space, and the 80% target
- ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish







