未加星标

Avoid Directly Manipulating File Descriptors in Shell Scripts

字体大小 | |
[系统(linux) 所属分类 系统(linux) | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Home

Avoid Directly Manipulating File Descriptors in Shell

2017-08-12

Here are two examples of directly manipulating file descriptors in shell. Do they look familiar to you?

# Naming file descriptors: from the Nix package repo exec {fd}< "$fn" # ... read from $fd ... exec {fd}<&- # Explicit save/restore of file descriptor: from Apache Yetus exec 6>&1 1>"${LOG_FILE}" # ... do work and write to stdout ... exec 1>&6 6>&-

I wasn't familiar with these constructs before I started writing a shell , and I suspect many others aren't either.

Now that I know what they do, I argue that they're not worth filling your head with. They could go in a book called Shell: The Bad Parts .

In short, there are simpler ways of accomplishing these tasks, like:

read myvar < "$fn" # read a variable from a line of a file

and

myfunc > ${LOG_FILE} # redirect stdout of a function to a file

If you're not convinced, read on for details.

Deciphering Line Noise

In both cases, the exec builtin is used to manipulate the file descriptor table of the current process. This is unrelated to the usual use of exec , which is to replace the executable that a process is running. Type help exec in bash for details.

The two examples come from real shell scripts in "the wild". I copied the relevant parts into demo.sh in my oilshell/blog-code repo. Then I tested and rewrote them.

Example 1: from the Nix Package Repo

The first example came up on issue 26 , when a user tried to run Nix build scripts withOSH.

It involves the file setup.sh in the nixpkgs repo. nixpkgs defines all packages for the Nix package manager.

I've rewritten the isElf function as isElfSimple .

These three lines:

exec {fd}< "$fn" read -r -n 4 -u "$fd" magic exec {fd}<&-

can be replaced with the single line:

# read 4 bytes from $path, without escaping, into the $magic var read -r -n 4 magic < "$path"

We simply redirect stdin from a file while the read builtin is running.

For another way of looking at it, the three lines of tortured syntax are saying the same thing as this simple python/C program:

int fd = open(path) # return a file descriptor from a path bytes magic = read(fd, 4) # read four bytes from that descriptor close(fd) # close the descriptor

Really, that's it! It's hard to imagine a worse syntax than exec {fd}< "$fn" for opening a file and exec {fd}<&- for closing it.

Example 2: from Apache Yetus

The second example was pointed out in comments to OSH Runs Real Shell Programs . The Apache Yetus project is a set of tools and libraries for release automation. It makes extensive use of shell scripts.

In the file builtin-bugsystem.sh , these lines:

if [[ -n "${CONSOLE_REPORT_FILE}" ]]; then exec 6>&1 1>"${CONSOLE_REPORT_FILE}" fi echo FOO echo BAR if [[ -n "${CONSOLE_REPORT_FILE}" ]]; then exec 1>&6 6>&- fi

Can be replaced with:

doWork() { echo FOO echo BAR } if [[ -n "${CONSOLE_REPORT_FILE}" ]]; then doWork > ${CONSOLE_REPORT_FILE} else doWork fi

By extracting a function doWork , you avoid the duplicate conditionals, as well as the need to explicitly save stdout as descriptor 6 .

Redirects to file system paths implicitly save and restore the file descriptor state, so there's no need to do it yourself.

For example, consider doWork > out.txt . This means:

fd = open("out.txt") Redirect stdout (descriptor 0 ) to fd . Run the doWork function, which may call both builtins and external programs. Builtins like echo FOO writes to stdout, but stdout is now connected to a disk file out.txt . External programs like ls inherit stdout from the shell, so their output also goes to out.txt . After all of doWork , restore stdout to whatever it was before, e.g. the terminal.

Also consider read myvar < in.txt . What does that do?

Under the Hood

Before implementing a shell, I didn't realize that a shell needs to save and restore file descriptors. There's a long discussion on this topic in the commment thread mentioned above.

If this concept is unfamiliar to you, it might help to think of file descriptors as pointers to data structures in the kernel . Those data structures could represent a pipe() or an open() file.

With this viewpoint, shell redirection is mutating process-wide globals in the kernel.

File descriptors aren't literally pointers; they're small integers, because the kernel is in a different address space. And you can't copy them with a C assignment statement; you have to use the dup2(1, 2) system call.

But otherwise the analogy holds: copying a file descriptor to a different position in the table is like copying a pointer, so it's not permanently lost when overwritten. This is more or less what a shell redirect like 1>&2 does.

Summary

I showed that advanced file descriptor manipulation in bash can be replaced with simpler constructs.

If you're not convinced, clone demo.sh and play with it. The tests show that the original code and my rewrites behave the same way:

$ ./demo.sh testIsElf $ ./demo.sh testDoWorkAndLog

If you know of a use case where directly manipulating file descriptors is essential or preferable, please leave a comment . I'm collecting feedback for the design of theOil language.

A Rule for Style Guides

Let me propose a more aggressive style rule:

The only file descriptor that should appear explicitly in a shell script is 2 , for stderr .

For example, I often use this log function:

log() { echo "$@" >&2 } log "hello $name" # goes to stderr

It explicitly mentions descriptor 2 so that no other part of the program needs to.

I've never found a need for any other file descriptor. Descriptors 0 and 1 for stdin and stdout are the defaults for many shell constructs, like echo and read , so they don't need to be explicitly mentioned.

If you disagree with this rule, let me know .

Reminder

If you like writing shell scripts, please try the first OSH release on your programs, and file bugs on Github .

If you'd like to run the development version, see the Contributing page. Right now I'm knocking off shell features needed to run Nix's setup.sh .

本文系统(linux)相关术语:linux系统 鸟哥的linux私房菜 linux命令大全 linux操作系统

主题: GitPython
分页:12
转载请注明
本文标题:Avoid Directly Manipulating File Descriptors in Shell Scripts
本站链接:http://www.codesec.net/view/561363.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 系统(linux) | 评论(0) | 阅读(11)