$ guides / network / network-flow-collector-disk-sizing ▌

Operations Guides

NetFlow storage sizing: how much disk your flow collector really needs

Flow records arrive at thousands to tens of thousands per second, and every record hits disk. The bottleneck is almost always disk throughput or capacity, not CPU.

This article covers the math: raw record sizes, effective storage after columnar compression, the capacity formula with worked examples, IOPS considerations, and the operational pitfalls that make disks fill faster than the formula predicts. The guidance applies to NetFlow v5/v9, IPFIX, and sFlow collectors using ClickHouse or similar columnar backends.

Record sizes at the wire

Every flow record starts as a fixed or variable-length binary structure exported by the device. The size depends on the protocol and the template fields selected.

Protocol	Approximate record size	Notes
NetFlow v5	~64 bytes	Fixed format: src/dst IP, ports, protocol, packet/byte counters, timestamps, next-hop, input/output interface.
NetFlow v9 / IPFIX	60-150 bytes	Variable, template-dependent. Cisco Flexible NetFlow with extended fields (latency timestamps, QoS bits, IPv6) pushes toward the high end.
sFlow	Variable	Sample datagrams contain multiple flow samples and counter samples. Per-sample size depends on fields captured (header length, interface info, VLAN tags).

These are raw on-wire sizes. The collector decodes, optionally enriches (GeoIP, AS lookup, VLAN name resolution), and writes to its storage engine. Enrichment adds bytes per record.

Effective storage after compression

Raw record size is not what lands on disk. Columnar storage engines like ClickHouse compress flow data through column-oriented layout, type-specific encoding (delta, gorilla, dictionary), and generic compression (LZ4, ZSTD).

Observed ClickHouse compression ratios for flow-style data:

Data characteristic	Compression ratio	Why
Log-style data (flow records, event logs)	8-12x	Repetitive fields across records (same AS, same interface, same protocol) compress well in columnar layout
Structured metrics (SNMP counters, time series)	5-8x	Numeric regularity, delta encoding
Wide tables with many null columns	15-20x	Sparse columns compress to near-zero

ntopng documents approximately 60 bytes per flow record when using their native ClickHouse integration. This figure reflects ntopng’s specific schema and field selection, which stores enriched records (application detection, GeoIP) that are larger than a bare NetFlow v5 export. Whether the same effective per-record size applies to raw IPFIX ingested through a different collector with a different schema is not established. Measure empirically against your own data.

To measure your actual ClickHouse compression ratio:

-- Read-only query against system.parts. Replace your_flow_db with your database name.
SELECT
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.parts
WHERE active AND database = 'your_flow_db'

Plan for a 5-10x reduction from raw record size when using a columnar backend, then verify.

The capacity formula

flowchart LR
    A["Flows/sec\nat peak"] --> B["x record size\n60-150 B"]
    B --> C["x 86400\nraw bytes/day"]
    C --> D["/ compression\n5-20x"]
    D --> E["x retention\ndays"]
    E --> F["x replication\nfactor"]
    F --> G["x 1.3\nmerge headroom"]
    G --> H["Disk to\nprovision"]

disk_bytes = (flows_per_sec * record_size * 86400 / compression_ratio) * retention_days * replication_factor * 1.3

flows_per_sec: Peak sustained flow rate. Use your busiest hour, not your daily average. For NetFlow full accounting on a 1 Gbps mixed-traffic link, expect 2,000-5,000 flows/sec.
record_size: Raw bytes per record (see table above). Use the high end for v9/IPFIX with extended fields.
compression_ratio: 5-10 for ClickHouse on flow data. Measure your actual ratio rather than assuming.
retention_days: Full-resolution retention before TTL expiry or downsampling.
replication_factor: 1 for single-node, 2+ for replicated ClickHouse clusters.
1.3: Merge headroom. ClickHouse temporarily holds both old and new data parts on disk during partition merges. During large merges, ClickHouse requires free space equal to at least 2x the largest partition size. If your daily partition is 50 GB, reserve 100 GB free at merge time, not 65 GB.

Worked examples at common link speeds

The table uses a NetFlow v5 record size of 64 bytes, a compression ratio of 10x, no replication, and the 1.3x merge headroom factor.

Scenario	Flow rate	Raw GB/day	Compressed GB/day	30-day provisioned	90-day provisioned
100 Mbps, NetFlow	300 flows/sec	1.66 GB	0.17 GB	6.5 GB	19 GB
1 Gbps, NetFlow (low)	2,000 flows/sec	11.1 GB	1.11 GB	43 GB	130 GB
1 Gbps, NetFlow (high)	5,000 flows/sec	27.6 GB	2.76 GB	108 GB	323 GB

For sFlow at 1:1000 on 1 Gbps, the sample rate depends on packet count, not bandwidth alone. A link averaging 500-byte packets produces roughly 250,000 packets/sec, yielding about 250 samples/sec at 1:1000. Small-packet-heavy links produce more samples per second at the same sampling ratio.

If you use ntopng with ClickHouse, their documented ~60 bytes per flow record (including enrichment) yields approximately 15.6 GB/day at 3,000 flows/sec, or roughly 607 GB for 30-day retention with merge headroom. This is higher than the formula predicts for bare NetFlow v5 records because ntopng stores enriched fields beyond the raw export.

ntopng publishes minimum disk guidance for their platform: small networks under 100 Mbps need less than 20 GB; medium networks (100 Mbps to 1 Gbps) need 100 GB SSD; high-traffic networks above 1 Gbps require proportionally more. These figures are minimum system requirements, not retention-sizing guidance.

IOPS and write patterns

ClickHouse is not IOPS-bound for sequential inserts. It batches writes into parts and merges them in the background. The IOPS concern is not ingestion but the background workload:

Part merges: ClickHouse merges smaller parts into larger ones asynchronously. The merge workload scales with insert frequency and part count. Monitor system.merges for active merge count and rows being processed.
TTL drops: When records expire, ClickHouse drops entire partitions or parts. This is a metadata operation for partition-level TTL, but column-level TTL triggers mutations that rewrite parts, spiking both IOPS and temporary disk usage.
Replication: Replicated tables write to all replicas. Each replica performs its own merges independently, doubling write amplification for a 2-replica cluster.

Use SSDs for any deployment exceeding a few hundred GB of compressed flow data. HDDs can work for small, low-ingest collectors, but merge latency under load will cause part accumulation and eventual Too many parts errors.

To check for merge backlog:

-- Point-in-time snapshot of active merges. Replace your_flow_db with your database name.
SELECT count() AS active_merges, sum(rows) AS rows_in_merge
FROM system.merges
WHERE database = 'your_flow_db'

A persistently high active merge count or a growing system.parts total (query SELECT count() FROM system.parts WHERE active AND database = 'your_flow_db') indicates that merges cannot keep up with ingestion.

Pitfalls that fill disk faster than predicted

Enrichment bloat. GeoIP, ASN lookups, and application detection add 20-100 bytes per record depending on the fields. If your collector enriches inline, the on-disk record is larger than the wire record. Measure the stored record size, not the export size.

Template changes. Adding fields to a NetFlow v9 or IPFIX template changes the schema. Existing partitions retain the old schema; new partitions use the new one. Both coexist on disk until old data expires, and the wider new schema increases storage for all subsequent data.

Sampling rate drift. If a device’s sampling rate changes (for example, from 1:1000 to 1:500 for better visibility during an incident), the sample volume doubles. Engineers sometimes increase sampling temporarily and forget to revert it.

Short retention TTL with no headroom. A TTL set to 30 days on a volume that holds exactly 30 days of data leaves no room for merge spikes. Part merges during TTL expiry can temporarily require 2x the partition size. Always provision the 1.3x factor or set TTL 2-3 days shorter than the physical capacity allows.

Counter samples in sFlow. sFlow exports both flow samples and counter samples (interface counters, CPU, memory). Some collectors ingest counter samples into the same storage volume, adding records that are easy to overlook in capacity planning. Check whether your collector separates them.

Index granularity and primary key choice. ClickHouse index granularity (default 8192 rows) affects memory usage during queries and the size of the primary index. A poorly chosen ORDER BY key that leads with high-cardinality columns (source IP, destination IP) inflates the index and degrades compression. Put low-cardinality columns first in the sort key.

Correlating disk fill with ingest using Netdata

If you run Netdata on the collector host, correlate disk fill events with flow ingest rate changes. Watch disk space and inodes on the ClickHouse data volume, disk utilization during expected merge windows, and ClickHouse metrics (active parts, merge queue depth, compressed/uncompressed data sizes) if the Netdata ClickHouse integration is enabled.

An unexpected disk usage spike that correlates with a flow rate increase points to sampling rate changes or template changes. A spike that does not correlate with ingest points to background merges or TTL mutations.

Summary checklist

Measure your actual flows/sec at peak, not average.
Measure your actual on-disk record size after enrichment, not the wire export size.
Measure your actual ClickHouse compression ratio with the query above.
Apply the capacity formula with the 1.3x merge headroom.
Reserve at least 2x the largest partition size as free space for merges.
Use SSD for anything beyond small deployments.
Monitor merge backlog and active part count as leading indicators of disk pressure.

The Netdata solution

Network monitoring with Netdata

Netdata monitors network infrastructure with per-second interface metrics, SNMP, NetFlow/sFlow/IPFIX, and ML anomaly detection. Correlate interface flapping, packet drops, routing changes, and traffic spikes with the systems that depend on them.

See network monitoring → Start monitoring free