存储性能测试方法论：从 fio 到业务场景的 Benchmark 设计

2026/6/15 17:17:12

存储性能测试方法论：从 fio 到业务场景的 Benchmark 设计

一、Benchmark 的常见误区：跑分不等于性能

存储性能测试最常见的误区是直接用 fio 跑出最高 IOPS 和最低延迟，然后声称"存储性能达标"。但 fio 的顺序读写测试与数据库的随机读写模式差异巨大——数据库的写入是先写 WAL（顺序）再写数据文件（随机），读取是按索引跳转（随机）加范围扫描（顺序）。fio 的纯随机测试无法反映真实业务负载。

更深层的问题是，存储性能不是单一指标。IOPS、吞吐量、延迟三个指标相互制约——提高并发度可以提升 IOPS 但延迟也会增加。业务场景对三个指标的容忍度不同：OLTP 关注 P99 延迟，OLAP 关注吞吐量，日志系统关注写入 IOPS。

二、存储性能指标体系：IOPS、吞吐量与延迟的三角关系

存储性能的三个核心指标：IOPS（每秒 IO 操作数）、吞吐量（MB/s）、延迟（ms）。三者之间存在制约关系——在固定硬件下，提高 IOPS 通常会增加延迟，提高吞吐量需要增大 IO 尺寸。

flowchart TB A[存储性能三角] --> B[IOPS<br/>每秒 IO 操作数] A --> C[吞吐量<br/>MB/s] A --> D[延迟<br/>ms] B --> E[制约关系] C --> E D --> E E --> F[高 IOPS → 高并发 → 延迟上升] E --> G[高吞吐 → 大 IO 尺寸 → IOPS 下降] E --> H[低延迟 → 低并发 → IOPS 受限] subgraph 业务场景映射 I[OLTP: 关注 P99 延迟<br/>随机读写 4KB-16KB] J[OLAP: 关注吞吐量<br/>顺序读写 64KB-1MB] K[日志: 关注写入 IOPS<br/>顺序写入 4KB-16KB] end I --> D J --> C K --> B

关键认知：存储性能测试必须模拟业务负载的 IO 模式（随机/顺序、读写比、IO 尺寸），而非追求单一指标的最优值。

三、生产级代码实现：fio 测试与业务场景 Benchmark

3.1 fio 基础性能测试

# 随机读测试（模拟 OLTP 读负载） # 为什么用 4KB 随机读：数据库的索引查找是 # 4KB-16KB 的随机 IO，这是 OLTP 场景的 # 典型 IO 模式 fio --name=rand_read \ --ioengine=libaio \ --iodepth=32 \ --rw=randread \ --bs=4k \ --direct=1 \ --size=10G \ --numjobs=4 \ --runtime=60 \ --time_based \ --group_reporting \ --output=rand_read.json # 随机写测试（模拟 WAL 写入） # 为什么用 libaio 而非 sync：libaio 是 Linux # 原生异步 IO 引擎，支持 IO 深度 > 1， # 能充分利用存储设备的并行能力； # sync 引擎是同步的，iodepth 参数无效 fio --name=rand_write \ --ioengine=libaio \ --iodepth=32 \ --rw=randwrite \ --bs=4k \ --direct=1 \ --size=10G \ --numjobs=4 \ --runtime=60 \ --time_based \ --group_reporting # 混合读写测试（模拟 OLTP 负载） # 为什么 70/30 读写比：OLTP 场景的读写比 # 通常在 70:30 到 80:20 之间 fio --name=mixed_rw \ --ioengine=libaio \ --iodepth=32 \ --rw=randrw \ --rwmixread=70 \ --bs=4k \ --direct=1 \ --size=10G \ --numjobs=4 \ --runtime=60 \ --time_based \ --group_reporting # 顺序写测试（模拟 WAL 和日志写入） fio --name=seq_write \ --ioengine=libaio \ --iodepth=16 \ --rw=write \ --bs=16k \ --direct=1 \ --size=10G \ --numjobs=2 \ --runtime=60 \ --time_based \ --group_reporting

3.2 数据库场景 Benchmark

import time import statistics from dataclasses import dataclass, field from typing import List @dataclass class BenchmarkResult: """Benchmark 结果""" scenario: str total_operations: int duration_seconds: float iops: float throughput_mbps: float latency_p50_ms: float latency_p99_ms: float latency_max_ms: float class DatabaseBenchmark: """数据库场景 Benchmark""" def __init__(self, db_connection): self.db = db_connection def oltp_point_select(self, iterations=100000) -> BenchmarkResult: """OLTP 点查 Benchmark""" latencies = [] for _ in range(iterations): start = time.perf_counter() cursor = self.db.cursor() cursor.execute( "SELECT * FROM orders WHERE id = %s", (self._random_id(),)) cursor.fetchone() elapsed = (time.perf_counter() - start) * 1000 latencies.append(elapsed) return self._compute_result("oltp_point_select", iterations, latencies) def oltp_range_scan(self, iterations=10000) -> BenchmarkResult: """OLTP 范围扫描 Benchmark""" latencies = [] for _ in range(iterations): start = time.perf_counter() cursor = self.db.cursor() cursor.execute( "SELECT * FROM orders " "WHERE created_at BETWEEN %s AND %s " "LIMIT 100", (self._random_date(), self._random_date())) cursor.fetchall() elapsed = (time.perf_counter() - start) * 1000 latencies.append(elapsed) return self._compute_result("oltp_range_scan", iterations, latencies) def oltp_write(self, iterations=50000) -> BenchmarkResult: """OLTP 写入 Benchmark""" latencies = [] for _ in range(iterations): start = time.perf_counter() cursor = self.db.cursor() cursor.execute( "INSERT INTO orders " "(id, user_id, amount, created_at) " "VALUES (%s, %s, %s, NOW())", (self._gen_id(), self._random_id(), self._random_amount())) self.db.commit() elapsed = (time.perf_counter() - start) * 1000 latencies.append(elapsed) return self._compute_result("oltp_write", iterations, latencies) def _compute_result(self, scenario, total, latencies): """计算 Benchmark 结果""" latencies.sort() duration = sum(latencies) / 1000 # ms → s iops = total / duration if duration > 0 else 0 # 估算吞吐量（假设平均行大小 200 字节） avg_row_size = 200 throughput = (total * avg_row_size / 1024 / 1024) / duration return BenchmarkResult( scenario=scenario, total_operations=total, duration_seconds=round(duration, 2), iops=round(iops, 2), throughput_mbps=round(throughput, 2), latency_p50_ms=round( latencies[len(latencies) // 2], 2), latency_p99_ms=round( latencies[int(len(latencies) * 0.99)], 2), latency_max_ms=round(latencies[-1], 2), )

3.3 结果对比与报告

class BenchmarkReporter: """Benchmark 报告生成器""" def compare(self, results: List[BenchmarkResult]) -> str: """生成对比报告""" report = "# 存储性能 Benchmark 报告\n\n" report += "| 场景 | IOPS | 吞吐量(MB/s) | P50(ms) | P99(ms) | 最大(ms) |\n" report += "|------|------|-------------|---------|---------|----------|\n" for r in results: report += (f"| {r.scenario} | {r.iops} | " f"{r.throughput_mbps} | " f"{r.latency_p50_ms} | " f"{r.latency_p99_ms} | " f"{r.latency_max_ms} |\n") # 性能评估 # 为什么关注 P99 而非均值：P99 反映 # 尾部延迟，是用户体验的瓶颈； # 均值被大量正常请求稀释， # 无法反映最差情况 report += "\n## 性能评估\n" for r in results: if r.latency_p99_ms > 50: report += f"- ⚠️ {r.scenario}: P99 延迟 {r.latency_p99_ms}ms 超过 50ms 阈值\n" elif r.latency_p99_ms > 20: report += f"- ⚡ {r.scenario}: P99 延迟 {r.latency_p99_ms}ms，需要关注\n" else: report += f"- ✅ {r.scenario}: P99 延迟 {r.latency_p99_ms}ms，性能良好\n" return report

四、存储 Benchmark 的架构权衡：测试保真度、成本与可重复性

测试保真度 vs 测试成本：fio 测试成本低但保真度低，数据库 Benchmark 保真度高但需要准备测试数据和环境。建议先用 fio 确认存储硬件的基本性能，再用数据库 Benchmark 验证业务场景性能。

测试数据的代表性：Benchmark 的测试数据必须与生产数据的分布一致——索引选择性、数据倾斜、行大小。用随机数据做 Benchmark 可能得出过于乐观的结果。

缓存的影响：数据库的 Buffer Pool 缓存会显著影响读性能。冷启动（缓存为空）和热启动（缓存已预热）的延迟可能差 10 倍。建议分别测试冷启动和热启动性能，冷启动反映系统重启后的恢复时间。

可重复性：存储性能受系统负载、缓存状态和 CPU 频率影响，同一测试可能得到不同结果。建议每次测试前清空缓存（drop_caches），固定 CPU 频率（cpupower frequency-set -g performance），并在无其他负载的环境下测试。

五、总结

存储性能测试的核心是模拟业务负载的 IO 模式，而非追求 fio 跑分。OLTP 场景关注 P99 延迟和随机 IOPS，OLAP 场景关注吞吐量和顺序读写，日志场景关注写入 IOPS。落地时建议先用 fio 验证硬件基线，再用数据库 Benchmark 验证业务性能。测试结果必须包含 P99 延迟，均值和最大值都不足以反映用户体验。