Linux大文件重定向和管道的效率哪个更高

175次阅读

共计 4949 个字符，预计需要花费 13 分钟才能阅读完成。

这篇文章主要讲解了“Linux 大文件重定向和管道的效率哪个更高”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着丸趣 TV 小编的思路慢慢深入，一起来研究和学习“Linux 大文件重定向和管道的效率哪个更高”吧！

#  命令 1，管道导入  shell  cat huge_dump.sql | mysql -uroot;

#  命令 2，重定向导入  shell  mysql -uroot   huge_dump.sql;

大家先看一下上面二个命令，假如 huge_dump.sql 文件很大，然后猜测一下哪种导入方式效率会更高一些?

这个问题挺有意思的，我的第一反应是：没比较过，应该是一样的，一个是 cat 负责打开文件，一个是 bash

这种场景在 MySQL 运维操作里面应该比较多，所以就花了点时间做了个比较和原理上的分析：

我们先构造场景：

首先准备一个程序 b.out 来模拟 mysql 对数据的消耗：

int main(int argc, char *argv[]) while(fread(buf, sizeof(buf), 1, stdin)   0); return 0; } $ gcc -o b.out b.c $ ls|./b.out

再来写个 systemtap 脚本用来方便观察程序的行为。

$ cat test.stp function should_log(){ return (execname() ==  cat  || execname() ==  b.out  || execname() ==  bash ) ; } probe syscall.open, syscall.close, syscall.read, syscall.write, syscall.pipe, syscall.fork, syscall.execve, syscall.dup, syscall.wait4 { if (!should_log()) next; printf(%s -  %s\n , thread_indent(0), probefunc()); } probe kernel.function(pipe_read), kernel.function(pipe_readv), kernel.function(pipe_write), kernel.function(pipe_writev) { if (!should_log()) next; printf(%s -  %s: file ino %d\n , thread_indent(0), probefunc(), __file_ino($filp)); } probe begin { println( :~) }

这个脚本重点观察几个系统调用的顺序和 pipe 的读写情况，然后再准备个 419M 的大文件 huge_dump.sql, 在我们几十 G 内存的机器很容易在内存里放下：

$ sudo dd if=/dev/urandom of=huge_dump.sql bs=4096 count=102400 102400+0 records in 102400+0 records out 419430400 bytes (419 MB) copied, 63.9886 seconds, 6.6 MB/s

因为这个文件是用 bufferio 写的，所以它的内容都 cache 在 pagecahce 内存里面，不会涉及到磁盘。

好了，场景齐全了，我们接着来比较下二种情况下的速度，第一种管道：

#  第一种管道方式  $ time (cat huge_dump.sql|./b.out) real 0m0.596s user 0m0.001s sys 0m0.919s #  第二种重定向方式  $ time (./b.out  huge_dump.sql) real 0m0.151s user 0m0.000s sys 0m0.147s

从执行时间数看出来速度有 3 倍左右的差别了，第二种明显快很多。

是不是有点奇怪? 好吧我们来从原来上面分析下，还是继续用数据说话：

这次准备个很小的数据文件，方便观察然后在一个窗口运行 stap

$ echo hello   huge_dump.sql $ sudo stap test.stp :~ 0 bash(26570): -  sys_read 0 bash(26570): -  sys_read 0 bash(26570): -  sys_write 0 bash(26570): -  sys_read 0 bash(26570): -  sys_write 0 bash(26570): -  sys_close 0 bash(26570): -  sys_pipe 0 bash(26570): -  sys_pipe 0 bash(26570): -  do_fork 0 bash(26570): -  sys_close 0 bash(26570): -  sys_close 0 bash(26570): -  do_fork 0 bash(13775): -  sys_close 0 bash(13775): -  sys_read 0 bash(13775): -  pipe_read: file ino 20906911 0 bash(13775): -  pipe_readv: file ino 20906911 0 bash(13776): -  sys_close 0 bash(13776): -  sys_close 0 bash(13776): -  sys_close 0 bash(13776): -  do_execve 0 bash(26570): -  sys_close 0 bash(26570): -  sys_close 0 bash(26570): -  sys_close 0 bash(13775): -  sys_close 0 bash(26570): -  sys_wait4 0 bash(13775): -  sys_close 0 bash(13775): -  sys_close 0 b.out(13776): -  sys_close 0 b.out(13776): -  sys_close 0 bash(13775): -  do_execve 0 b.out(13776): -  sys_open 0 b.out(13776): -  sys_close 0 b.out(13776): -  sys_open 0 b.out(13776): -  sys_read 0 b.out(13776): -  sys_close 0 cat(13775): -  sys_close 0 cat(13775): -  sys_close 0 b.out(13776): -  sys_read 0 b.out(13776): -  pipe_read: file ino 20906910 0 b.out(13776): -  pipe_readv: file ino 20906910 0 cat(13775): -  sys_open 0 cat(13775): -  sys_close 0 cat(13775): -  sys_open 0 cat(13775): -  sys_read 0 cat(13775): -  sys_close 0 cat(13775): -  sys_open 0 cat(13775): -  sys_close 0 cat(13775): -  sys_open 0 cat(13775): -  sys_read 0 cat(13775): -  sys_write 0 cat(13775): -  pipe_write: file ino 20906910 0 cat(13775): -  pipe_writev: file ino 20906910 0 cat(13775): -  sys_read 0 b.out(13776): -  sys_read 0 b.out(13776): -  pipe_read: file ino 20906910 0 b.out(13776): -  pipe_readv: file ino 20906910 0 cat(13775): -  sys_close 0 cat(13775): -  sys_close 0 bash(26570): -  sys_wait4 0 bash(26570): -  sys_close 0 bash(26570): -  sys_wait4 0 bash(26570): -  sys_write

stap 在收集数据了，我们在另外一个窗口运行管道的情况：

$ cat huge_dump.sql|./b.out

我们从 systemtap 的日志可以看出：

bash fork 了 2 个进程。

然后 execve 分别运行 cat 和 b.out 进程, 这二个进程用 pipe 通信。

数据从由 cat 从 huge_dump.sql 读出，写到 pipe, 然后 b.out 从 pipe 读出处理。

那么再看下命令 2 重定向的情况：

$ ./b.out   huge_dump.sql stap 输出： 0 bash(26570): -  sys_read 0 bash(26570): -  sys_read 0 bash(26570): -  sys_write 0 bash(26570): -  sys_read 0 bash(26570): -  sys_write 0 bash(26570): -  sys_close 0 bash(26570): -  sys_pipe 0 bash(26570): -  do_fork 0 bash(28926): -  sys_close 0 bash(28926): -  sys_read 0 bash(28926): -  pipe_read: file ino 20920902 0 bash(28926): -  pipe_readv: file ino 20920902 0 bash(26570): -  sys_close 0 bash(26570): -  sys_close 0 bash(26570): -  sys_wait4 0 bash(28926): -  sys_close 0 bash(28926): -  sys_open 0 bash(28926): -  sys_close 0 bash(28926): -  do_execve 0 b.out(28926): -  sys_close 0 b.out(28926): -  sys_close 0 b.out(28926): -  sys_open 0 b.out(28926): -  sys_close 0 b.out(28926): -  sys_open 0 b.out(28926): -  sys_read 0 b.out(28926): -  sys_close 0 b.out(28926): -  sys_read 0 b.out(28926): -  sys_read 0 bash(26570): -  sys_wait4 0 bash(26570): -  sys_write 0 bash(26570): -  sys_read

bash fork 了一个进程，打开数据文件。

然后把文件句柄搞到 0 句柄上，这个进程 execve 运行 b.out。

然后 b.out 直接读取数据。

现在就非常清楚为什么二种场景速度有 3 倍的差别：

命令 1，管道方式：读二次，写一次, 外加一个进程上下文切换。

命令 2，重定向方式：只读一次。

结论：Linux 下大文件重定向效率更高。

感谢各位的阅读，以上就是“Linux 大文件重定向和管道的效率哪个更高”的内容了，经过本文的学习后，相信大家对 Linux 大文件重定向和管道的效率哪个更高这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是丸趣 TV，丸趣 TV 小编将为大家推送更多相关知识点的文章，欢迎关注！

正文完

发表至：计算机运维

2023-08-04

转载说明：除特殊说明外本站除技术相关以外文章皆由网络搜集发布，转载请注明出处。

电脑连点器如何设置最快

win10系统怎么更改电脑开机密码

linux中xz命令怎么使用

如何使用PowerShell命令批量初始化OneDrive for Business

删除ns后一直处于Terminating状态中该怎么办