您好,登錄后才能下訂單哦!
如何進行Fio隨機讀IOPS測試值可能偏大的原因分析,相信很多沒有經驗的人對此束手無策,為此本文總結了問題出現的原因和解決方法,通過這篇文章希望你能解決這個問題。
問題描述:
在使用fio
進行虛擬機磁盤(Ceph
的RBD
,格式化為ext4
文件系統)的IOPS
測試時,發現randread
比預估值高許多;
在使用相同參數進行randwrite
測試之后,再進行randread
時會出現此現象;
而使用dd
構建測試文件后,再進行randread
就不會出現這種情況,IOPS
數值正常。
初步推測,可能fio
的隨機是偽隨機,導致前后的randwrite
和randread
使用了相同的偽隨機序列。文件系統在進行物理塊分配時從前往后分配,在邏輯上隨機的塊實際上是順序寫入物理磁盤,最終的隨機讀實際上是順序讀,導致IO
被磁盤調度器合并,實際IO
次數變少,所以測試的IOPS
偏大,為此進行詳細分析測試。
打開fio
的debug
模式,執行測試,輸出日志:
$ fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=10 -group_reporting -filename=iotest -name=Rand_Write_Testing --debug=random > rand_write_offset.log $ fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=10 -group_reporting -filename=iotest -name=Rand_Read_Testingg --debug=random > rand_read_offset.log
查看日志:
$ head -n30 rand_write_offset.log fio: set debug option random Rand_Write_Testing: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.1 Starting 1 process random 4057532 off rand 259043585 random 4057532 off rand 3179521932 random 4057532 off rand 3621444214 random 4057532 off rand 2018697059 random 4057532 off rand 1726199243 random 4057532 off rand 3608323581 random 4057532 off rand 1634212905 random 4057532 off rand 1518359867 random 4057532 off rand 3921331707 random 4057532 off rand 287004724 random 4057532 off rand 3673173177 random 4057532 off rand 2796675757 random 4057532 off rand 3988051731 random 4057532 off rand 1060357494 random 4057532 off rand 1685717462 random 4057532 off rand 2400737531 random 4057532 off rand 1891936796 random 4057532 off rand 3455447349 random 4057532 off rand 1553547805 random 4057532 off rand 2660809810 random 4057532 off rand 17263379 random 4057532 off rand 1823528783 random 4057532 off rand 1355450167 random 4057532 off rand 2956359995 random 4057532 off rand 3392712188 random 4057532 off rand 4240594610 $ $ head -n30 rand_read_offset.log fio: set debug option random Rand_Read_Testingg: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.1 Starting 1 process random 4057831 off rand 259043585 random 4057831 off rand 3179521932 random 4057831 off rand 3621444214 random 4057831 off rand 2018697059 random 4057831 off rand 1726199243 random 4057831 off rand 3608323581 random 4057831 off rand 1634212905 random 4057831 off rand 1518359867 random 4057831 off rand 3921331707 random 4057831 off rand 287004724 random 4057831 off rand 3673173177 random 4057831 off rand 2796675757 random 4057831 off rand 3988051731 random 4057831 off rand 1060357494 random 4057831 off rand 1685717462 random 4057831 off rand 2400737531 random 4057831 off rand 1891936796 random 4057831 off rand 3455447349 random 4057831 off rand 1553547805 random 4057831 off rand 2660809810 random 4057831 off rand 17263379 random 4057831 off rand 1823528783 random 4057831 off rand 1355450167 random 4057831 off rand 2956359995 random 4057831 off rand 3392712188 random 4057831 off rand 4240594610
日志對比,發現右側的隨機偏移都是一樣的:
分析的源碼來源和版本如下:
$ git clone https://github.com/axboe/fio.git $ cd fio $ git branch -av * master ee636f3 libaio: switch to newer libaio polled IO API remotes/origin/HEAD -> origin/master remotes/origin/latency-probe fcd4e74 target: fixes remotes/origin/master ee636f3 libaio: switch to newer libaio polled IO API
查找debug
選項的定義和引用位置:
$ grep -rHn \"debug\" init.c:176: .name = (char *) "debug",
查找random
參數的定義和引用位置,可以看到random
參數使用FD_RANDOM
宏或者枚舉值進行定義:
$ grep -rHn \"random\" -A5 init.c init.c:2260: { .name = "random", init.c-2261- .help = "Random generation logging", init.c-2262- .shift = FD_RANDOM, init.c-2263- }, init.c-2264- { .name = "parse", init.c-2265- .help = "Parser logging",
查找FD_RANDOM
這個宏開關的定義和引用位置,可以發現定義在debug.h
中,在io_u.c
中被引用,是用來開關debug
打印的,其中第98
行的和之前的Debug
日志格式相同:
$ grep -rHn FD_RANDOM debug.h:13: FD_RANDOM, init.c:2262: .shift = FD_RANDOM, io_u.c:98: dprint(FD_RANDOM, "off rand %llu\n", (unsigned long long) r); io_u.c:124: dprint(FD_RANDOM, "get_next_rand_offset: offset %llu busy\n",
查看FD_RANDOM
引用處附近的額源碼,第96
行處就是生成隨機數的地方,第98
行對生成的隨機數進行打印:
$ grep -rHn FD_RANDOM io_u.c -C12 io_u.c-86- io_u.c-87-static int __get_next_rand_offset(struct thread_data *td, struct fio_file *f, io_u.c-88- enum fio_ddir ddir, uint64_t *b, io_u.c-89- uint64_t lastb) io_u.c-90-{ io_u.c-91- uint64_t r; io_u.c-92- io_u.c-93- if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE || io_u.c-94- td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64) { io_u.c-95- io_u.c-96- r = __rand(&td->random_state); io_u.c-97- io_u.c:98: dprint(FD_RANDOM, "off rand %llu\n", (unsigned long long) r); io_u.c-99- io_u.c-100- *b = lastb * (r / (rand_max(&td->random_state) + 1.0)); io_u.c-101- } else { io_u.c-102- uint64_t off = 0; io_u.c-103- io_u.c-104- assert(fio_file_lfsr(f)); io_u.c-105- io_u.c-106- if (lfsr_next(&f->lfsr, &off)) io_u.c-107- return 1; io_u.c-108- io_u.c-109- *b = off; io_u.c-110- } -- io_u.c-112- /* io_u.c-113- * if we are not maintaining a random map, we are done. io_u.c-114- */ io_u.c-115- if (!file_randommap(td, f)) io_u.c-116- goto ret; io_u.c-117- io_u.c-118- /* io_u.c-119- * calculate map offset and check if it's free io_u.c-120- */ io_u.c-121- if (random_map_free(f, *b)) io_u.c-122- goto ret; io_u.c-123- io_u.c:124: dprint(FD_RANDOM, "get_next_rand_offset: offset %llu busy\n", io_u.c-125- (unsigned long long) *b); io_u.c-126- io_u.c-127- *b = axmap_next_free(f->io_axmap, *b); io_u.c-128- if (*b == (uint64_t) -1ULL) io_u.c-129- return 1; io_u.c-130-ret: io_u.c-131- return 0; io_u.c-132-} io_u.c-133- io_u.c-134-static int __get_next_rand_offset_zipf(struct thread_data *td, io_u.c-135- struct fio_file *f, enum fio_ddir ddir, io_u.c-136- uint64_t *b)
查找dprint
函數或者宏的定義及引用處,定義在debug.h
中:
$ grep -rHn " dprint" debug.h:62:#define dprint(type, str, args...) \ debug.h:71:static inline void dprint(int type, const char *str, ...) gettime.c:320: dprint(FD_TIME, "tmp=%llu, sft=%u\n", tmp, sft); io_u.h:153:static inline void dprint_io_u(struct io_u *io_u, const char *p) io_u.h:170:#define dprint_io_u(io_u, p) t/time-test.c:88:#define dprintf(...) if (DEBUG) { printf(__VA_ARGS__); }
查看debug.h
中的dprint
定義內容:
$ grep -rHn " dprint" -C7 debug.h debug.h-55-}; debug.h-56-extern const struct debug_level debug_levels[]; debug.h-57- debug.h-58-extern unsigned long fio_debug; debug.h-59- debug.h-60-void __dprint(int type, const char *str, ...) __attribute__((format (printf, 2, 3))); debug.h-61- debug.h:62:#define dprint(type, str, args...) \ debug.h-63- do { \ debug.h-64- if (((1 << type) & fio_debug) == 0) \ debug.h-65- break; \ debug.h-66- __dprint((type), (str), ##args); \ debug.h-67- } while (0) \ debug.h-68- debug.h-69-#else debug.h-70- debug.h:71:static inline void dprint(int type, const char *str, ...) debug.h-72-{ debug.h-73-} debug.h-74-#endif debug.h-75- debug.h-76-#endif
查看__dprint
函數,使用log_prevalist
進行最終的字符串打印:
$ grep -rHn " __dprint" -C10 debug.c-1-#include <assert.h> debug.c-2-#include <stdarg.h> debug.c-3- debug.c-4-#include "debug.h" debug.c-5-#include "log.h" debug.c-6- debug.c-7-#ifdef FIO_INC_DEBUG debug.c:8:void __dprint(int type, const char *str, ...) debug.c-9-{ debug.c-10- va_list args; debug.c-11- debug.c-12- assert(type < FD_DEBUG_MAX); debug.c-13- debug.c-14- va_start(args, str); debug.c-15- log_prevalist(type, str, args); debug.c-16- va_end(args); debug.c-17-} debug.c-18-#endif -- debug.h-50-struct debug_level { debug.h-51- const char *name; debug.h-52- const char *help; debug.h-53- unsigned long shift; debug.h-54- unsigned int jobno; debug.h-55-}; debug.h-56-extern const struct debug_level debug_levels[]; debug.h-57- debug.h-58-extern unsigned long fio_debug; debug.h-59- debug.h:60:void __dprint(int type, const char *str, ...) __attribute__((format (printf, 2, 3))); debug.h-61- debug.h-62-#define dprint(type, str, args...) \ debug.h-63- do { \ debug.h-64- if (((1 << type) & fio_debug) == 0) \ debug.h-65- break; \ debug.h-66- __dprint((type), (str), ##args); \ debug.h-67- } while (0) \ debug.h-68- debug.h-69-#else debug.h-70- -- t/debug.c-1-#include <stdio.h> t/debug.c-2- t/debug.c-3-FILE *f_err; t/debug.c-4-struct timespec *fio_ts = NULL; t/debug.c-5-unsigned long fio_debug = 0; t/debug.c-6- t/debug.c:7:void __dprint(int type, const char *str, ...) t/debug.c-8-{ t/debug.c-9-} t/debug.c-10- t/debug.c-11-void debug_init(void) t/debug.c-12-{ t/debug.c-13- f_err = stderr; t/debug.c-14-}
查看log.c
中的log_prevalist
函數定義,首先打印了當前進程PID
,然后是類型字符串,最后才是格式化字符串:
/* add prefix for the specified type in front of the valist */ void log_prevalist(int type, const char *fmt, va_list args) { char *buf1, *buf2; int len; pid_t pid; pid = gettid(); if (fio_debug_jobp && *fio_debug_jobp != -1U && pid != *fio_debug_jobp) return; len = vasprintf(&buf1, fmt, args); if (len < 0) return; len = asprintf(&buf2, "%-8s %-5u %s", debug_levels[type].name, (int) pid, buf1); free(buf1); if (len < 0) return; len = log_info_buf(buf2, len); free(buf2); }
查找debug_levels
字符串數組的定義處:
$ grep -rHn debug_levels debug.h:56:extern const struct debug_level debug_levels[]; gfio.c:1187: buttons[i] = gtk_check_button_new_with_label(debug_levels[i].name); gfio.c:1188: gtk_widget_set_tooltip_text(buttons[i], debug_levels[i].help); init.c:2144: const struct debug_level *dl = &debug_levels[0]; init.c:2235:const struct debug_level debug_levels[] = { init.c:2327: for (i = 0; debug_levels[i].name; i++) { init.c:2328: dl = &debug_levels[i]; init.c:2344: for (i = 0; debug_levels[i].name; i++) { init.c:2345: dl = &debug_levels[i]; log.c:59: len = asprintf(&buf2, "%-8s %-5u %s", debug_levels[type].name,
在init.c
可以看到最終的定義:
#ifdef FIO_INC_DEBUG const struct debug_level debug_levels[] = { { .name = "process", .help = "Process creation/exit logging", .shift = FD_PROCESS, }, { .name = "file", .help = "File related action logging", .shift = FD_FILE, }, { .name = "io", .help = "IO and IO engine action logging (offsets, queue, completions, etc)", .shift = FD_IO, }, { .name = "mem", .help = "Memory allocation/freeing logging", .shift = FD_MEM, }, { .name = "blktrace", .help = "blktrace action logging", .shift = FD_BLKTRACE, }, { .name = "verify", .help = "IO verification action logging", .shift = FD_VERIFY, }, { .name = "random", .help = "Random generation logging", .shift = FD_RANDOM, }, { .name = "parse", .help = "Parser logging", .shift = FD_PARSE, }, { .name = "diskutil", .help = "Disk utility logging actions", .shift = FD_DISKUTIL, }, { .name = "job", .help = "Logging related to creating/destroying jobs", .shift = FD_JOB, }, { .name = "mutex", .help = "Mutex logging", .shift = FD_MUTEX }, { .name = "profile", .help = "Logging related to profiles", .shift = FD_PROFILE, }, { .name = "time", .help = "Logging related to time keeping functions", .shift = FD_TIME, }, { .name = "net", .help = "Network logging", .shift = FD_NET, }, { .name = "rate", .help = "Rate logging", .shift = FD_RATE, }, { .name = "compress", .help = "Log compression logging", .shift = FD_COMPRESS, }, { .name = "steadystate", .help = "Steady state detection logging", .shift = FD_STEADYSTATE, }, { .name = "helperthread", .help = "Helper thread logging", .shift = FD_HELPERTHREAD, }, { .name = "zbd", .help = "Zoned Block Device logging", .shift = FD_ZBD, }, { .name = NULL, }, }; static int set_debug(const char *string) { const struct debug_level *dl; char *p = (char *) string; char *opt; int i; if (!string) return 0; if (!strcmp(string, "?") || !strcmp(string, "help")) { log_info("fio: dumping debug options:"); for (i = 0; debug_levels[i].name; i++) { dl = &debug_levels[i]; log_info("%s,", dl->name); } log_info("all\n"); return 1; } while ((opt = strsep(&p, ",")) != NULL) { int found = 0; if (!strncmp(opt, "all", 3)) { log_info("fio: set all debug options\n"); fio_debug = ~0UL; continue; } for (i = 0; debug_levels[i].name; i++) { dl = &debug_levels[i]; found = !strncmp(opt, dl->name, strlen(dl->name)); if (!found) continue; if (dl->shift == FD_JOB) { opt = strchr(opt, ':'); if (!opt) { log_err("fio: missing job number\n"); break; } opt++; fio_debug_jobno = atoi(opt); log_info("fio: set debug jobno %d\n", fio_debug_jobno); } else { log_info("fio: set debug option %s\n", opt); fio_debug |= (1UL << dl->shift); } break; } if (!found) log_err("fio: debug mask %s not found\n", opt); } return 0; } #else static int set_debug(const char *string) { log_err("fio: debug tracing not included in build\n"); return 1; } #endif
查找randwrite
參數的定義和引用處,使用TD_DDIR_RANDWRITE
作為參數值:
$ grep -rHn \"randwrite\" -C5 io_ddir.h-62-} io_ddir.h-63- io_ddir.h-64-static inline const char *ddir_str(enum td_ddir ddir) io_ddir.h-65-{ io_ddir.h-66- static const char *__str[] = { NULL, "read", "write", "rw", "rand", io_ddir.h:67: "randread", "randwrite", "randrw", io_ddir.h-68- "trim", NULL, "trimwrite", NULL, "randtrim" }; io_ddir.h-69- io_ddir.h-70- return __str[ddir]; io_ddir.h-71-} io_ddir.h-72- -- options.c-1690- }, options.c-1691- { .ival = "randread", options.c-1692- .oval = TD_DDIR_RANDREAD, options.c-1693- .help = "Random read", options.c-1694- }, options.c:1695: { .ival = "randwrite", options.c-1696- .oval = TD_DDIR_RANDWRITE, options.c-1697- .help = "Random write", options.c-1698- }, options.c-1699- { .ival = "randtrim", options.c-1700- .oval = TD_DDIR_RANDTRIM, -- profiles/act.c-182- profiles/act.c-183- if (act_add_opt("name=act-%s-%s", reads ? "read" : "write", dev)) profiles/act.c-184- return 1; profiles/act.c-185- if (act_add_opt("filename=%s", dev)) profiles/act.c-186- return 1; profiles/act.c:187: if (act_add_opt("rw=%s", reads ? "randread" : "randwrite")) profiles/act.c-188- return 1; profiles/act.c-189- if (reads) { profiles/act.c-190- int rload = ao->load * R_LOAD / ao->threads_per_queue; profiles/act.c-191- profiles/act.c-192- if (act_add_opt("numjobs=%u", ao->threads_per_queue)) -- t/sgunmap-test.py-116- t/sgunmap-test.py-117- t/sgunmap-test.py-118-def runalltests(args, qd, batch): t/sgunmap-test.py-119- block = False t/sgunmap-test.py-120- for dev in [args.chardev, args.blockdev]: t/sgunmap-test.py:121: for rw in ["randread", "randwrite", "randtrim"]: t/sgunmap-test.py-122- parameters = ["--name=test", t/sgunmap-test.py-123- "--time_based", t/sgunmap-test.py-124- "--runtime=30s", t/sgunmap-test.py-125- "--output-format=json", t/sgunmap-test.py-126- "--ioengine=sg",
查找TD_DDIR_RANDWRITE
的定義,是由TD_DDIR_WRITE
和TD_DDIR_RAND
組成,我們應該關注TD_DDIR_RAND
這個參數對程序執行的影響:
$ grep -rHn TD_DDIR_RANDWRITE io_ddir.h:38: TD_DDIR_RANDWRITE = TD_DDIR_WRITE | TD_DDIR_RAND, options.c:1696: .oval = TD_DDIR_RANDWRITE,
查找TD_DDIR_RAND
的定義和引用處,主要是被td_random
宏引用,應該是作為標志位判斷使用:
$ grep -rHn TD_DDIR_RAND -C3 io_ddir.h-31-enum td_ddir { io_ddir.h-32- TD_DDIR_READ = 1 << 0, io_ddir.h-33- TD_DDIR_WRITE = 1 << 1, io_ddir.h:34: TD_DDIR_RAND = 1 << 2, io_ddir.h-35- TD_DDIR_TRIM = 1 << 3, io_ddir.h-36- TD_DDIR_RW = TD_DDIR_READ | TD_DDIR_WRITE, io_ddir.h:37: TD_DDIR_RANDREAD = TD_DDIR_READ | TD_DDIR_RAND, io_ddir.h:38: TD_DDIR_RANDWRITE = TD_DDIR_WRITE | TD_DDIR_RAND, io_ddir.h:39: TD_DDIR_RANDRW = TD_DDIR_RW | TD_DDIR_RAND, io_ddir.h:40: TD_DDIR_RANDTRIM = TD_DDIR_TRIM | TD_DDIR_RAND, io_ddir.h-41- TD_DDIR_TRIMWRITE = TD_DDIR_TRIM | TD_DDIR_WRITE, io_ddir.h-42-}; io_ddir.h-43- -- io_ddir.h-45-#define td_write(td) ((td)->o.td_ddir & TD_DDIR_WRITE) io_ddir.h-46-#define td_trim(td) ((td)->o.td_ddir & TD_DDIR_TRIM) io_ddir.h-47-#define td_rw(td) (((td)->o.td_ddir & TD_DDIR_RW) == TD_DDIR_RW) io_ddir.h:48:#define td_random(td) ((td)->o.td_ddir & TD_DDIR_RAND) io_ddir.h-49-#define file_randommap(td, f) (!(td)->o.norandommap && fio_file_axmap((f))) io_ddir.h-50-#define td_trimwrite(td) (((td)->o.td_ddir & TD_DDIR_TRIMWRITE) \ io_ddir.h-51- == TD_DDIR_TRIMWRITE) -- options.c-1689- .help = "Sequential trim", options.c-1690- }, options.c-1691- { .ival = "randread", options.c:1692: .oval = TD_DDIR_RANDREAD, options.c-1693- .help = "Random read", options.c-1694- }, options.c-1695- { .ival = "randwrite", options.c:1696: .oval = TD_DDIR_RANDWRITE, options.c-1697- .help = "Random write", options.c-1698- }, options.c-1699- { .ival = "randtrim", options.c:1700: .oval = TD_DDIR_RANDTRIM, options.c-1701- .help = "Random trim", options.c-1702- }, options.c-1703- { .ival = "rw", -- options.c-1709- .help = "Sequential read and write mix", options.c-1710- }, options.c-1711- { .ival = "randrw", options.c:1712: .oval = TD_DDIR_RANDRW, options.c-1713- .help = "Random read and write mix" options.c-1714- }, options.c-1715- { .ival = "trimwrite",
接著查找td_random
的定義和引用,重點關注在io_u.c
中的影響,因為這里才是序列生成的主要位置,發現當時random
時,調用了get_next_rand_block
函數,應該就是生成隨機數的位置:
$ grep -rHn td_random -C5 io_u.c io_u.c-416- assert(ddir_rw(ddir)); io_u.c-417- io_u.c-418- b = offset = -1ULL; io_u.c-419- io_u.c-420- if (rw_seq) { io_u.c:421: if (td_random(td)) { io_u.c-422- if (should_do_random(td, ddir)) { io_u.c-423- ret = get_next_rand_block(td, f, ddir, &b); io_u.c-424- *is_random = true; io_u.c-425- } else { io_u.c-426- *is_random = false; -- io_u.c-934- } io_u.c-935- io_u.c-936- /* io_u.c-937- * mark entry before potentially trimming io_u io_u.c-938- */ io_u.c:939: if (td_random(td) && file_randommap(td, io_u->file)) io_u.c-940- io_u->buflen = mark_random_map(td, io_u, offset, io_u->buflen); io_u.c-941- io_u.c-942-out: io_u.c-943- dprint_io_u(io_u, "fill"); io_u.c-944- td->zone_bytes += io_u->buflen;
在io_u.c
中查看get_next_rand_block
的代碼,最終調用了之前已經分析到的調用dprint
的位置,使用__rand
和rand_max
進行隨機數計算:
static int get_next_rand_block(struct thread_data *td, struct fio_file *f, enum fio_ddir ddir, uint64_t *b) { if (!get_next_rand_offset(td, f, ddir, b)) return 0; if (td->o.time_based || (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM)) { fio_file_reset(td, f); loop_cache_invalidate(td, f); if (!get_next_rand_offset(td, f, ddir, b)) return 0; } dprint(FD_IO, "%s: rand offset failed, last=%llu, size=%llu\n", f->file_name, (unsigned long long) f->last_pos[ddir], (unsigned long long) f->real_file_size); return 1; } static int get_next_rand_offset(struct thread_data *td, struct fio_file *f, enum fio_ddir ddir, uint64_t *b) { if (td->o.random_distribution == FIO_RAND_DIST_RANDOM) { uint64_t lastb; lastb = last_block(td, f, ddir); if (!lastb) return 1; return __get_next_rand_offset(td, f, ddir, b, lastb); } else if (td->o.random_distribution == FIO_RAND_DIST_ZIPF) return __get_next_rand_offset_zipf(td, f, ddir, b); else if (td->o.random_distribution == FIO_RAND_DIST_PARETO) return __get_next_rand_offset_pareto(td, f, ddir, b); else if (td->o.random_distribution == FIO_RAND_DIST_GAUSS) return __get_next_rand_offset_gauss(td, f, ddir, b); else if (td->o.random_distribution == FIO_RAND_DIST_ZONED) return __get_next_rand_offset_zoned(td, f, ddir, b); else if (td->o.random_distribution == FIO_RAND_DIST_ZONED_ABS) return __get_next_rand_offset_zoned_abs(td, f, ddir, b); log_err("fio: unknown random distribution: %d\n", td->o.random_distribution); return 1; } static int __get_next_rand_offset(struct thread_data *td, struct fio_file *f, enum fio_ddir ddir, uint64_t *b, uint64_t lastb) { uint64_t r; if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE || td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64) { r = __rand(&td->random_state); dprint(FD_RANDOM, "off rand %llu\n", (unsigned long long) r); *b = lastb * (r / (rand_max(&td->random_state) + 1.0)); } else { uint64_t off = 0; assert(fio_file_lfsr(f)); if (lfsr_next(&f->lfsr, &off)) return 1; *b = off; } /* * if we are not maintaining a random map, we are done. */ if (!file_randommap(td, f)) goto ret; /* * calculate map offset and check if it's free */ if (random_map_free(f, *b)) goto ret; dprint(FD_RANDOM, "get_next_rand_offset: offset %llu busy\n", (unsigned long long) *b); *b = axmap_next_free(f->io_axmap, *b); if (*b == (uint64_t) -1ULL) return 1; ret: return 0; }
查找__rand
函數,是定義在lib/rand.h
中的一個靜態內聯函數:
$ grep -rHn " __rand(" backend.c:1012: io_u->rand_seed = __rand(&td->verify_state); backend.c:1014: io_u->rand_seed *= __rand(&td->verify_state); engines/rdma.c:715: index = __rand(&rd->rand_state) % rd->rmt_nr; engines/rdma.c:725: index = __rand(&rd->rand_state) % rd->rmt_nr; filesetup.c:337: r = __rand(&td->file_size_state); io_u.c:96: r = __rand(&td->random_state); io_u.c:548: r = __rand(&td->bsrange_state[ddir]); io_u.c:1165: r = __rand(&td->next_file_state); lib/gauss.c:16: r = __rand(&gs->r); lib/gauss.c:28: sum += __rand(&gs->r) % (gs->nranges + 1); lib/rand.c:128: unsigned long r = __rand(fs); lib/rand.c:131: r *= (unsigned long) __rand(fs); lib/rand.c:190: unsigned long r = __rand(fs); lib/rand.c:193: r *= (unsigned long) __rand(fs); lib/rand.h:96:static inline uint64_t __rand(struct frand_state *state) lib/zipf.c:32: zs->rand_off = __rand(&zs->rand); lib/zipf.c:55: rand_uni = (double) __rand(&zs->rand) / (double) FRAND32_MAX; lib/zipf.c:82: double rand = (double) __rand(&zs->rand) / (double) FRAND32_MAX; trim.c:77: r = __rand(&td->trim_state); verify.c:1370: io_u->rand_seed = __rand(&td->verify_state); verify.c:1372: io_u->rand_seed *= __rand(&td->verify_state);
查看該函數的實現,其最終計算結果只與輸入形參有關,實實在在的偽隨機:
struct taus88_state { unsigned int s1, s2, s3; }; struct taus258_state { uint64_t s1, s2, s3, s4, s5; }; struct frand_state { unsigned int use64; union { struct taus88_state state32; struct taus258_state state64; }; }; static inline unsigned int __rand32(struct taus88_state *state) { #define TAUSWORTHE(s,a,b,c,d) ((s&c)<<d) ^ (((s <<a) ^ s)>>b) state->s1 = TAUSWORTHE(state->s1, 13, 19, 4294967294UL, 12); state->s2 = TAUSWORTHE(state->s2, 2, 25, 4294967288UL, 4); state->s3 = TAUSWORTHE(state->s3, 3, 11, 4294967280UL, 17); return (state->s1 ^ state->s2 ^ state->s3); } static inline uint64_t __rand64(struct taus258_state *state) { uint64_t xval; xval = ((state->s1 << 1) ^ state->s1) >> 53; state->s1 = ((state->s1 & 18446744073709551614ULL) << 10) ^ xval; xval = ((state->s2 << 24) ^ state->s2) >> 50; state->s2 = ((state->s2 & 18446744073709551104ULL) << 5) ^ xval; xval = ((state->s3 << 3) ^ state->s3) >> 23; state->s3 = ((state->s3 & 18446744073709547520ULL) << 29) ^ xval; xval = ((state->s4 << 5) ^ state->s4) >> 24; state->s4 = ((state->s4 & 18446744073709420544ULL) << 23) ^ xval; xval = ((state->s5 << 3) ^ state->s5) >> 33; state->s5 = ((state->s5 & 18446744073701163008ULL) << 8) ^ xval; return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4 ^ state->s5); } static inline uint64_t __rand(struct frand_state *state) { if (state->use64) return __rand64(&state->state64); else return __rand32(&state->state32); }
再次查找rand_max
的實現,依舊定義在lib/rand.h
中:
$ grep -rHn " rand_max(" filesetup.c:336: frand_max = rand_max(&td->file_size_state); io_u.c:546: frand_max = rand_max(&td->bsrange_state[ddir]); io_u.c:1162: uint64_t frand_max = rand_max(&td->next_file_state); lib/rand.h:27:static inline uint64_t rand_max(struct frand_state *state) trim.c:76: frand_max = rand_max(&td->trim_state);
輸出也是嚴重依賴輸入,依舊不是隨機計算:
#define FRAND32_MAX (-1U) #define FRAND64_MAX (-1ULL) static inline uint64_t rand_max(struct frand_state *state) { if (state->use64) return FRAND64_MAX; else return FRAND32_MAX; }
從上面可以看出,整機計算結果只與輸入的random_state
有關,那么接下來查找一下random_state
的初始化和引用處,最終發現random_state
使用init_rand_seed
進行初始化,之后只有__rand
和rand_max
函數會對其進行變更:
$ grep -rHn random_state fio.h:357: struct frand_state random_state; init.c:1056: init_rand_seed(&td->random_state, td->rand_seeds[FIO_RAND_BLOCK_OFF], use64); io_u.c:96: r = __rand(&td->random_state); io_u.c:100: *b = lastb * (r / (rand_max(&td->random_state) + 1.0)); verify.c:1623: if (td->random_state.use64) { verify.c:1624: s->rand.state64.s[0] = cpu_to_le64(td->random_state.state64.s1); verify.c:1625: s->rand.state64.s[1] = cpu_to_le64(td->random_state.state64.s2); verify.c:1626: s->rand.state64.s[2] = cpu_to_le64(td->random_state.state64.s3); verify.c:1627: s->rand.state64.s[3] = cpu_to_le64(td->random_state.state64.s4); verify.c:1628: s->rand.state64.s[4] = cpu_to_le64(td->random_state.state64.s5); verify.c:1632: s->rand.state32.s[0] = cpu_to_le32(td->random_state.state32.s1); verify.c:1633: s->rand.state32.s[1] = cpu_to_le32(td->random_state.state32.s2); verify.c:1634: s->rand.state32.s[2] = cpu_to_le32(td->random_state.state32.s3);
再查看init_rand_seed
的實現,依舊是偽隨機計算,只與輸入形參seed
有關:
// lib/rand.h static inline uint64_t __rand64(struct taus258_state *state) { uint64_t xval; xval = ((state->s1 << 1) ^ state->s1) >> 53; state->s1 = ((state->s1 & 18446744073709551614ULL) << 10) ^ xval; xval = ((state->s2 << 24) ^ state->s2) >> 50; state->s2 = ((state->s2 & 18446744073709551104ULL) << 5) ^ xval; xval = ((state->s3 << 3) ^ state->s3) >> 23; state->s3 = ((state->s3 & 18446744073709547520ULL) << 29) ^ xval; xval = ((state->s4 << 5) ^ state->s4) >> 24; state->s4 = ((state->s4 & 18446744073709420544ULL) << 23) ^ xval; xval = ((state->s5 << 3) ^ state->s5) >> 33; state->s5 = ((state->s5 & 18446744073701163008ULL) << 8) ^ xval; return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4 ^ state->s5); } // lib/rand.c static inline uint64_t __seed(uint64_t x, uint64_t m) { return (x < m) ? x + m : x; } static void __init_rand32(struct taus88_state *state, unsigned int seed) { int cranks = 6; #define LCG(x, seed) ((x) * 69069 ^ (seed)) state->s1 = __seed(LCG((2^31) + (2^17) + (2^7), seed), 1); state->s2 = __seed(LCG(state->s1, seed), 7); state->s3 = __seed(LCG(state->s2, seed), 15); while (cranks--) __rand32(state); } static void __init_rand64(struct taus258_state *state, uint64_t seed) { int cranks = 6; #define LCG64(x, seed) ((x) * 6906969069ULL ^ (seed)) state->s1 = __seed(LCG64((2^31) + (2^17) + (2^7), seed), 1); state->s2 = __seed(LCG64(state->s1, seed), 7); state->s3 = __seed(LCG64(state->s2, seed), 15); state->s4 = __seed(LCG64(state->s3, seed), 33); state->s5 = __seed(LCG64(state->s4, seed), 49); while (cranks--) __rand64(state); } void init_rand(struct frand_state *state, bool use64) { state->use64 = use64; if (!use64) __init_rand32(&state->state32, 1); else __init_rand64(&state->state64, 1); } void init_rand_seed(struct frand_state *state, unsigned int seed, bool use64) { state->use64 = use64; if (!use64) __init_rand32(&state->state32, seed); else __init_rand64(&state->state64, seed); }
再查看一下seed
這個形參td->rand_seeds[FIO_RAND_BLOCK_OFF]
的引用處,是在init.c
的1054
處進行初始化的:
$ grep -rHn "rand_seeds\[FIO_RAND_BLOCK_OFF\]" filesetup.c:1292: seed = td->rand_seeds[FIO_RAND_BLOCK_OFF]; filesetup.c:1882: lfsr_reset(&f->lfsr, td->rand_seeds[FIO_RAND_BLOCK_OFF]); init.c:1054: td->rand_seeds[FIO_RAND_BLOCK_OFF] = FIO_RANDSEED * td->thread_number; init.c:1056: init_rand_seed(&td->random_state, td->rand_seeds[FIO_RAND_BLOCK_OFF], use64);
查看一下該出函數的實現,發現依賴進程的thread_number
變量:
// init.c static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64) { unsigned int read_seed = td->rand_seeds[FIO_RAND_BS_OFF]; unsigned int write_seed = td->rand_seeds[FIO_RAND_BS1_OFF]; unsigned int trim_seed = td->rand_seeds[FIO_RAND_BS2_OFF]; int i; /* * trimwrite is special in that we need to generate the same * offsets to get the "write after trim" effect. If we are * using bssplit to set buffer length distributions, ensure that * we seed the trim and write generators identically. Ditto for * verify, read and writes must have the same seed, if we are doing * read verify. */ if (td->o.verify != VERIFY_NONE) write_seed = read_seed; if (td_trimwrite(td)) trim_seed = write_seed; init_rand_seed(&td->bsrange_state[DDIR_READ], read_seed, use64); init_rand_seed(&td->bsrange_state[DDIR_WRITE], write_seed, use64); init_rand_seed(&td->bsrange_state[DDIR_TRIM], trim_seed, use64); td_fill_verify_state_seed(td); init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], false); if (td->o.file_service_type == FIO_FSERVICE_RANDOM) init_rand_seed(&td->next_file_state, td->rand_seeds[FIO_RAND_FILE_OFF], use64); else if (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM) init_rand_file_service(td); init_rand_seed(&td->file_size_state, td->rand_seeds[FIO_RAND_FILE_SIZE_OFF], use64); init_rand_seed(&td->trim_state, td->rand_seeds[FIO_RAND_TRIM_OFF], use64); init_rand_seed(&td->delay_state, td->rand_seeds[FIO_RAND_START_DELAY], use64); init_rand_seed(&td->poisson_state[0], td->rand_seeds[FIO_RAND_POISSON_OFF], 0); init_rand_seed(&td->poisson_state[1], td->rand_seeds[FIO_RAND_POISSON2_OFF], 0); init_rand_seed(&td->poisson_state[2], td->rand_seeds[FIO_RAND_POISSON3_OFF], 0); init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], false); init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], false); if (!td_random(td)) return; if (td->o.rand_repeatable) td->rand_seeds[FIO_RAND_BLOCK_OFF] = FIO_RANDSEED * td->thread_number; init_rand_seed(&td->random_state, td->rand_seeds[FIO_RAND_BLOCK_OFF], use64); for (i = 0; i < DDIR_RWDIR_CNT; i++) { struct frand_state *s = &td->seq_rand_state[i]; init_rand_seed(s, td->rand_seeds[FIO_RAND_SEQ_RAND_READ_OFF], false); } }
在查找thread_number
變量的初始化和引用位置,發現HOWTO
里面貌似有解釋:
$ grep -rHn thread_number HOWTO:1245: * thread_number`, where the thread number is a counter that starts at 0 and backend.c:64:unsigned int thread_number = 0; backend.c:1624: ret = fio_cpus_split(&o->cpumask, td->thread_number - 1); backend.c:1899: verify_save_state(td->thread_number); backend.c:2131: td->thread_number - 1, &data); backend.c:2248: todo = thread_number; backend.c:2254: print_status_init(td->thread_number - 1); backend.c:2488: if (!thread_number) client.c:923: pdu.thread_number = cpu_to_le32(client->thread_number); client.c:948: dst->thread_number = le32_to_cpu(src->thread_number); client.c:1078: if (client->opt_lists && p->ts.thread_number <= client->jobs) client.c:1079: opt_list = &client->opt_lists[p->ts.thread_number - 1]; client.c:1095: client_ts.thread_number = p->ts.thread_number; client.c:1653: ret->thread_number = le32_to_cpu(ret->thread_number); client.c:1832: client->thread_number = le32_to_cpu(pdu->thread_number); client.h:60: uint32_t thread_number; eta.c:41: char c = __run_str[td->thread_number - 1]; eta.c:118: __run_str[td->thread_number - 1] = c; eta.c:411: eta_secs = malloc(thread_number * sizeof(uint64_t)); eta.c:412: memset(eta_secs, 0, thread_number * sizeof(uint64_t)); eta.c:530: je->nr_threads = thread_number; eta.c:704: if (!thread_number) filesetup.c:1203: seed = jhash(f->file_name, strlen(f->file_name), 0) * td->thread_number; fio.1:1012:* thread_number', where the thread number is a counter that starts at 0 and fio.h:183: unsigned int thread_number; fio.h:509:extern unsigned int thread_number; fio.h:701: for ((i) = 0, (td) = &threads[0]; (i) < (int) thread_number; (i)++, (td)++) gclient.c:299: client_ts.thread_number = p->ts.thread_number; gclient.c:578: p->thread_number = le32_to_cpu(p->thread_number); init.c:480: if (thread_number >= max_jobs) { init.c:486: td = &threads[thread_number++]; init.c:505: td->thread_number = thread_number; init.c:536: memset(&threads[td->thread_number - 1], 0, sizeof(*td)); init.c:537: thread_number--; init.c:1054: td->rand_seeds[FIO_RAND_BLOCK_OFF] = FIO_RANDSEED * td->thread_number; init.c:1073: td->rand_seeds[i] = FIO_RANDSEED * td->thread_number init.c:1235: td->rand_seeds[i] = seed * td->thread_number + i; init.c:1565: td->thread_number, suf, o->per_job_logs); init.c:1569: td->thread_number, suf, o->per_job_logs); init.c:1573: td->thread_number, suf, o->per_job_logs); init.c:1605: td->thread_number, suf, o->per_job_logs); init.c:1637: td->thread_number, suf, o->per_job_logs); init.c:1668: td->thread_number, suf, o->per_job_logs); init.c:3004: if (!thread_number) { libfio.c:160: thread_number = 0; server.c:758: spdu.jobs = cpu_to_le32(thread_number); server.c:801: spdu.jobs = cpu_to_le32(thread_number); server.c:842: spdu.jobs = cpu_to_le32(thread_number); server.c:943: tnumber = le32_to_cpu(pdu->thread_number); server.c:947: if (!tnumber || tnumber > thread_number) { server.c:1478: p.ts.thread_number = cpu_to_le32(ts->thread_number); server.c:1958: .thread_number = cpu_to_le32(td->thread_number), server.c:2029: .thread_number = cpu_to_le32(td->thread_number), server.h:172: uint32_t thread_number; server.h:192: uint32_t thread_number; stat.c:1782: ts->thread_number = td->thread_number; stat.c:1998: rt = malloc(thread_number * sizeof(unsigned long long)); stat.h:152: uint32_t thread_number; stat.h:365:#define THREAD_RUNSTR_SZ __THREAD_RUNSTR_SZ(thread_number) verify.c:1168: hdr->thread = td->thread_number; verify.c:1797: fd = open_state_file(td->o.name, prefix, td->thread_number - 1, 0);
進HOWTO
看看,這個變量其實就是一個進程編號,我們這個一直是單進程測試,這個值就是0
:
$ grep -rHn thread_number HOWTO -C5 HOWTO-1240- offset is aligned to the minimum block size. HOWTO-1241- HOWTO-1242-.. option:: offset_increment=int HOWTO-1243- HOWTO-1244- If this is provided, then the real offset becomes `offset + offset_increment HOWTO:1245: * thread_number`, where the thread number is a counter that starts at 0 and HOWTO-1246- is incremented for each sub-job (i.e. when :option:`numjobs` option is HOWTO-1247- specified). This option is useful if there are several jobs which are HOWTO-1248- intended to operate on a file in parallel disjoint segments, with even HOWTO-1249- spacing between the starting points. HOWTO-1250-
在init.c
里查找一下他的引用處,第480
行附近的語句貌似是主要修改這個值的位置:
$ grep -rHn thread_number init.c init.c:480: if (thread_number >= max_jobs) { init.c:486: td = &threads[thread_number++]; init.c:505: td->thread_number = thread_number; init.c:536: memset(&threads[td->thread_number - 1], 0, sizeof(*td)); init.c:537: thread_number--; init.c:1054: td->rand_seeds[FIO_RAND_BLOCK_OFF] = FIO_RANDSEED * td->thread_number; init.c:1073: td->rand_seeds[i] = FIO_RANDSEED * td->thread_number init.c:1235: td->rand_seeds[i] = seed * td->thread_number + i; init.c:1565: td->thread_number, suf, o->per_job_logs); init.c:1569: td->thread_number, suf, o->per_job_logs); init.c:1573: td->thread_number, suf, o->per_job_logs); init.c:1605: td->thread_number, suf, o->per_job_logs); init.c:1637: td->thread_number, suf, o->per_job_logs); init.c:1668: td->thread_number, suf, o->per_job_logs); init.c:3004: if (!thread_number) {
進init.c
中進行詳細查看,thread_number
這個值的確是歲進程數的增加而進行遞加的:
// init.c /* * Return a free job structure. */ static struct thread_data *get_new_job(bool global, struct thread_data *parent, bool preserve_eo, const char *jobname) { struct thread_data *td; if (global) return &def_thread; if (setup_thread_area()) { log_err("error: failed to setup shm segment\n"); return NULL; } if (thread_number >= max_jobs) { log_err("error: maximum number of jobs (%d) reached.\n", max_jobs); return NULL; } td = &threads[thread_number++]; *td = *parent; INIT_FLIST_HEAD(&td->opt_list); if (parent != &def_thread) copy_opt_list(td, parent); td->io_ops = NULL; td->io_ops_init = 0; if (!preserve_eo) td->eo = NULL; td->o.uid = td->o.gid = -1U; dup_files(td, parent); fio_options_mem_dupe(td); profile_add_hooks(td); td->thread_number = thread_number; td->subjob_number = 0; if (jobname) td->o.name = strdup(jobname); if (!parent->o.group_reporting || parent == &def_thread) stat_number++; return td; }
整個隨機序列的數值生成只依賴于創建這個Job
的序號,以及整個測試序列的數量,這是實實在在的偽隨機。之前的--debug=random
中第一列是PID
,第二列就是最終生成的序列序號;
先使用randwrite
在文件系統中進行數據寫入,如果文件系統為空,且寫入期間沒有其他文件被寫入,那么雖然邏輯上是隨機序列,但實際被文件系統分配到連續的物理扇區上(同時,寫操作沒有被塊設備調度器進行合并,所以隨機寫的IO
次數并未被變少);
接著再進行randread
進行IOPS
測試時,由于偽隨機的原因,和之前randwrite
的邏輯隨機序列是一樣的,但經過文件系統到達塊設備時,就是物理上連續的扇區,會被塊設備調度器合并,變成順序讀,IO
次數減少,所以這時測試的IOPS
會偏大(至于為啥寫沒被合并,讀卻被合并,這跟文件系統和調度器算法中,讀寫合并等待時間以及隊列長度等參數不同有關);
在使用fio
進行隨機讀測試之前,應該使用dd
或者fio
的順序寫初始化文件,使讀寫的序列順序不同,即可規避這個問題;
至于隨機寫的測試,最好也要在測試前對文件系統進行一些列不規則的文件增刪讀寫,使得文件系統的文件扇區分配不再連續,以便得出更靠譜的測試結果。
看完上述內容,你們掌握如何進行Fio隨機讀IOPS測試值可能偏大的原因分析的方法了嗎?如果還想學到更多技能或想了解更多相關內容,歡迎關注億速云行業資訊頻道,感謝各位的閱讀!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。