提问人:CasseroleBoi 提问时间:10/3/2023 最后编辑:Peter CordesCasseroleBoi 更新时间:10/5/2023 访问量:112
libc 可以比 syscall 快吗?
Can libc be faster than syscall?
问:
我正在努力大量优化一些代码(谢天谢地,没有平台独立性 - 仅限 Linux)。我创建了一个非常简单的框架来测量经过的时间(以时钟周期为单位)。我早期的想法是抓住用 Linux 系统调用替换 libc 函数的唾手可得的果实(重要提示:我一次最多打印一个字符,并且没有任何格式化)。然而,我的测试始终表明,它的速度大约是(时钟周期是用 64 位十六进制整数获得的,并且是 64 位十六进制整数;如果足够小,它们的差异显示为十进制):putchar
syscall
rdtsc
注意:测试不包含重复:以下代码仅通过执行一次来衡量。
syscall
:
# %rdi contains a string
movq $1 , %rax # system call 1: write
movq %rdi, %rsi # what: char pointer
movq $1 , %rdi # where: stdout
movq $1 , %rdx # how many bytes: 1
syscall
__________________________________
RDTSC Post: 0x0002c22fbac1c7b5
RDTSC Pre: 0x0002c22fbabf4d1f
Clock cycles: 0x0000000000027a96
Clock cycles: 162454
putchar
:
# %rdi contains a string
mov %rdi, %rsi
xor %edi, %edi
movb (%rsi), %dil
call putchar
__________________________________
RDTSC Post: 0x0002c221713726cd
RDTSC Pre: 0x0002c2217136469f
Clock cycles: 0x000000000000e02e
Clock cycles: 57390
注意:即使我只包含一个结果,该方法的结果也始终慢约 ~2.5 倍。syscall
考虑到通过使用 GDB 逐步执行此函数可以看出其所有内部工作原理,这尤其奇怪:putchar
IO_validate_vtable (vtable=0x7ffff7e16600 <_IO_file_jumps>) at ./libio/libioP.h:943
943 ./libio/libioP.h: No such file or directory.
(gdb)
__GI__IO_file_doallocate (fp=0x7ffff7e1a780 <_IO_2_1_stdout_>) at ./libio/libioP.h:947
947 in ./libio/libioP.h
(gdb)
__GI__IO_file_stat (fp=0x7ffff7e1a780 <_IO_2_1_stdout_>, st=0x7fffffffde90) at ./libio/fileops.c:1146
1146 ./libio/fileops.c: No such file or directory.
(gdb)
1147 in ./libio/fileops.c
(gdb)
__GI___fstat64 (fd=1, buf=0x7fffffffde90) at ../sysdeps/unix/sysv/linux/fstat64.c:29
29 ../sysdeps/unix/sysv/linux/fstat64.c: No such file or directory.
(gdb)
30 in ../sysdeps/unix/sysv/linux/fstat64.c
(gdb)
35 in ../sysdeps/unix/sysv/linux/fstat64.c
(gdb)
__GI___fstatat64 (fd=1, file=0x7ffff7dd846f "", buf=0x7fffffffde90, flag=4096) at ../sysdeps/unix/sysv/linux/fstatat64.c:153
153 ../sysdeps/unix/sysv/linux/fstatat64.c: No such file or directory.
(gdb)
163 in ../sysdeps/unix/sysv/linux/fstatat64.c
(gdb)
fstatat64_time64_stat (flag=4096, buf=0x7fffffffde90, file=0x7ffff7dd846f "", fd=1) at ../sysdeps/unix/sysv/linux/fstatat64.c:98
98 in ../sysdeps/unix/sysv/linux/fstatat64.c
(gdb)
__GI___fstatat64 (fd=1, file=0x7ffff7dd846f "", buf=0x7fffffffde90, flag=4096) at ../sysdeps/unix/sysv/linux/fstatat64.c:166
166 in ../sysdeps/unix/sysv/linux/fstatat64.c
(gdb)
__GI__IO_file_doallocate (fp=0x7ffff7e1a780 <_IO_2_1_stdout_>) at ./libio/filedoalloc.c:86
86 ./libio/filedoalloc.c: No such file or directory.
(gdb)
91 in ./libio/filedoalloc.c
(gdb)
__gnu_dev_major (__dev=34817) at ../include/sys/sysmacros.h:47
47 ../include/sys/sysmacros.h: No such file or directory.
(gdb)
91 ./libio/filedoalloc.c: No such file or directory.
(gdb)
94 in ./libio/filedoalloc.c
(gdb)
97 in ./libio/filedoalloc.c
(gdb)
101 in ./libio/filedoalloc.c
(gdb)
__GI___libc_malloc (bytes=bytes@entry=1024) at ./malloc/malloc.c:3287
3287 ./malloc/malloc.c: No such file or directory.
(gdb)
3294 in ./malloc/malloc.c
(gdb)
3295 in ./malloc/malloc.c
(gdb)
ptmalloc_init () at ./malloc/arena.c:315
315 ./malloc/arena.c: No such file or directory.
(gdb)
ptmalloc_init () at ./malloc/arena.c:313
313 in ./malloc/arena.c
(gdb)
321 in ./malloc/arena.c
(gdb)
0x00007ffff7ca1a31 in tcache_key_initialize () at ./malloc/malloc.c:3162
3162 ./malloc/malloc.c: No such file or directory.
(gdb)
318 ./malloc/arena.c: No such file or directory.
(gdb)
321 in ./malloc/arena.c
(gdb)
tcache_key_initialize () at ./malloc/malloc.c:3162
3162 ./malloc/malloc.c: No such file or directory.
(gdb)
__GI___getrandom (buffer=buffer@entry=0x7ffff7e204d8 <tcache_key>, length=length@entry=8, flags=flags@entry=1) at ../sysdeps/unix/sysv/linux/getrandom.c:28
答:
libc 可以比 syscall 快吗?
等价物是 ,而不是 。syscall
call write
call putchar
当然:直接会比 .syscall
call write
不完全是,因为我对 1 个字符的长字符串进行操作 - 缓冲在这里不应该发挥作用。
情况恰恰相反:
当写入少量数据时(在:正好一个字节的情况下!),您将通过缓冲节省大量时间;putchar()
在每次调用中写入大量数据时(例如),您几乎不会节省任何内容。fwrite(x,1,10000,y)
write(x,y,1000)
需要的时间远小于 1000 倍。write(x,y,1)
因此,与调用 1000 次相比,将数据写入缓冲区并在 1000 个字符后调用时可以节省大量时间。write(x,y,1000)
write(x,y,1)
正是这样做的:它将每个字符保存在缓冲区中,当缓冲区已满时,就会被调用。putchar()
write()
原因是 Linux 内核的很多代码行在调用(或相应的)时执行一次。当使用单个调用写入 1000 字节时,这些行将执行一次;调用 1000 次时,这些行执行 1000 次。write()
syscall
write()
write()
评论
write
评论
strace
putchar
putchar