Extra Cheese

By A blog by Gary Bernhardt, Creator & Destroyer of Software

17 May 2010

This is a tar pipe:

(cd src && tar -cf - .) | (cd dest && tar -xpf -)

It basically means "copy the src directory to dst, preserving permissions and other special stuff." It does this by firing up two tars – one tarring up src, the other untarring to dst, and wiring them together.

You can learn a whole lot about Unix from that one little command.

The Subshell

The ( starts a subshell. This is actually spawning a process using fork(2) – everything inside the parentheses is in a separate instance of bash. The subshell, by virtue of being a separate process, is a natural namespacing mechanism: the parent bash won't see the child's changes to variables and – important for our case – the working directory. This is why we used a subshell: it isolates the cd to only the subshell doing the tar.

The cd changes the subshell process's working directory to src. After that comes &&. Logically, this means "do the next thing only if the previous one worked; otherwise, fail." Under the hood, it's testing $?, the previous command's exit code variable, and failing if it's not 0. The weird thing about && is that it means "only continue if the return code was 0", which is the opposite of what you'd expect from a real programming language. But Unix commands (and most C functions, for that matter) return 0 on success, so it all works out.

The Tar

Now that we've cded to src, we start a tar. -c means "create a tarball", as opposed to extracting or listing. -f tells tar what file to create, and it's getting an argument of -, which is the Unix convention for stdout. The final argument, ., means "the current directory", so the whole command together means "tar up the current directory and dump the tar data to stdout."

Stdout is one of three special file descriptors in Unix: stdin, stdout, and stderr. At a terminal, the keys you type are going into the shell's stdin, and the output it shows you is coming from stdout. Stderr is basically another stdout, but used for errors. If our tar command failed, the error messages would show up on our terminal via stderr even though the stdout is being piped away.

stdin, stdout, and stderr are files 0, 1, and 2. Always. This is why you append 2>&1 to a command to say "combine the stdout and stderr streams": it means "send stderr (descriptor 2) to stdout (descriptor 1)".

The Pipe

The original command contained two subshells. We'll get to the second later, but what we care about now is that they're joined with a |. This makes bash create a pipe using pipe(2). A pipe is a unidirectional file-like object: it has a reading end and a writing end.

Bash fork(2)s, making a copy of its own running process. Through a series of changes, this newly forked bash process will become the tar process that will feed the pipe.

When piping commands together, the stdin and stdout file descriptors are used, but they write to and read from the pipe instead of the terminal. Bash uses dup2(2) to duplicate the writing end of the pipe to stdout. This means that any data the newly-forked process writes will go into the pipe. Under the covers, dup2(2) is saying "forget about what used to be at file descriptor 1; make this other file descriptor the new 1."

At this point the process is still bash. It has to exec(3) the tar binary before it's ready to do real work. This replaces the running bash process with a copy of tar, but doesn't close the file descriptors. There's now a running tar process with its stdout glued to the writing end of the pipe.

Moving Data

After bash exec(3)s tar, the tar process looks at its arguments and sees that it's supposed to be tarring up the current directory (which is src because the subshell it came from cded to it), and that it's supposed to emit the tarred data on stdout (which is the writing end of the pipe that bash set up). It starts reading files, generating tar data, and spewing it to stdout.

But wait – the reading tar isn't even there yet! Doesn't matter. Both sides of the pipe existed from the time it was created, and pipes are buffered. The writing tar will be allowed to shove data into the pipe until it's full. Eventually, when the pipe is full, the write(2) will block. The kernel's CPU scheduler will kick in, notice that bash is waiting for the CPU, and context switch to it.

The Reading

Bash starts to execute the other side of the pipe. It cds into the dest directory and starts a separate tar process with -xpf -. -x means "extract", -p means "preserve permissions" – usually a good thing – and -f - now means "read from stdin". You know, I never really thought about that until just now: sometimes - means stdin; sometimes it means stdout. It was so natural that I never considered it. Anyway...

The second tar process is started in basically the same way as the first. It fork(2)s, sets its stdin to the reading side of the pipe (using the dup2(2) trick again), and execs tar. Both ends of the pipe are now connected. The writing tar has its stdout hooked to the writing end of the pipe, and the reading tar has its stdin hooked to the reading end.

The parent bash process executes wait(2)s on the subshell bash processes, which will block until the subprocesses finish. The subshell bash processes also execute wait(2)s on their forked tar processes. Because the bashes are all blocked, a context switch happens and the newly-spawned reading tar process gets the CPU.

The reading tar process, being freshly forked and execed, starts up, processes its arguments, and sees that it's supposed to read from stdin (which is the pipe, not that it cares). The blob of data that the writing tar wrote into the pipe's buffer is sitting there, so the reading tar pulls it out and starts to decode it. There may be enough data that it can actually reconstruct a file or two. But pretty soon, it's going to exhaust the buffer.

The Context Switch

The reading tar will never know when the pipe's buffer is empty; it just keeps calling read(2). At the beginning, read(2) will keep returning the data that the writer wrote. Eventually, it'll empty the pipe's buffer and the read(2) will block. The kernel's scheduler will kick in again and switch back to the writer. It gets woken up, the write(2) call that was blocked completes, and the writer continues filling the pipe until it's full again.

This repeats again and again: the writing tar writes until the pipe is full, the reading tar reads until it's empty, on and on.

The Exhausted Pipe

Eventually, the writing tar will finish tarring up everything and sending it over the pipe. When this happens, it'll clean up and exit. Exiting implicitly closes its stdout, which means the writing end of the pipe closes. The reading tar, who's blocking on the empty pipe, sees its call to read(2) return 0, which means it's reached the end of the file.

Since the tar stream has ended, the reader cleans up and exits as well. The subshell processes exit because they've finished their commands. Finally, the parent bash process's two wait(2) calls return. The prompt comes back. From beginning to end, about 10ms have elapsed.

And that's Unix!


  • I don't know how Bash actually implements any of this; I've just made up a conveniently simple implementation. The same goes for tar.
  • This is all from memory; the only things I looked up were man page numbers. Caveat emptor.
  • The -f arguments to tar aren't actually needed, but are illustrative.
  • I assume the tar file format doesn't indicate when the file ends, so the reader must wait for read(2) to return 0. I doubt this is actually the case.
  • I drastically simplify the CPU scheduling, process management, and execution order, approximating them for simplicity.
  • Tar pipes are mostly obsolete, but far too awesome to be forgotten!