Capsicum


I spent a couple of years evangelizing about Capsicum. I wrote many articles about it. So, it is very natural that I would also like to update you on this blog about the progress of the Capsicum project in FreeBSD, because this is what I’m doing in my free time. That said I feel that this blog wouldn’t be completed without some introduction to what Capsicum is. This post should fill this gap. Over the next weeks and months we will extend this topic and discuss different parts of Capsicum. Without further introduction let’s jump into the topic

Imagine an application that can do anything with your data. Literally anything. Imagine an application which can get your private photos and send them over the internet to some external server. In UNIX-like operating systems almost all applications can do that. If you had an exploitable bug in grep(1) somebody would be able to do so. If you had an exploitable bug in cat(1) somebody would be able to do so. When your application has access to all your user data we are talking about ambient authority.

What if you could do stuff another way? What if your application could have only a capability to use the things it really needs to use. What if you grep(1) would have only read-only rights to a file that it’s should parse, and it couldn’t create a network connection or send signals to different processes. This is a capability world which Capsicum implements.

The first part of Capsicum is a very tight sandbox, which can be obtained using cap_enter(2) syscall. This tight sandbox means that you don’t have access to any global namespace like File Paths or Process ID (PIDs), although your program can use capabilities to accomplish it’s tasks. It’s turns out that descriptors match ideally to an idea of capabilities. You can clone them (by dup(2)), send them over the UNIX domain socket, remove them (by closing), etc. Beyond that you have a set of functions which works on descriptors and provides you new ones, like openat(2) which takes a descriptor to a directory and returns you a new one for a given file name. Using this syscall you can have a single descriptor which gives you a capability to all files in a directory. The list of all allowed syscall is kept in the sys/kern/capabilites.conf file. When you examine it you will notice that most of those syscalls require a descriptor or are not changing the global state of the machine like umask(2) which only changes a setting for the current process

Capsicum additionally implements capabilities rights. It’s simply means that you can limit the descriptor even further. Using the cap_rights_limit(2) syscall you can set particular rights on particular descriptors. For example, you can say that some descriptors should be read-only, and others append-only. Currently there are 50-ish rights. For more details please refer to rights(4) man page.

Many Unix-like applications can be divided into two parts. The first part, the so-called initialization phase, is where we prepare our program, parse some options etc. The second part is the main phase where we are doing the complicated calculations, parsing etc. In the simplest scenario we can just divide those phases in our sandboxed application. In the initialization phase we open all the files we need, create all sockets etc. After that we are entering the capability mode. The complicated phase – the main phase – is sandbox. If our complicated algorithm will have any exploitable bugs the attacker will not gain the access to any other data that process already had. We can go one step further and limit those descriptors to be read-only or write-only to limit the damage which can be done by an attacker.

One of the programs which is sandboxed using this method is bspatch(1). Allan Jude sandboxed it with r304807 commit. Let’s look in to two parts of it. In the first one we just moved all the opens in the initialization phase.

@@ -87,7 +102,44 @@    /* Open patch file */    if ((f = fopen(argv[3], "rb")) == NULL)        err(1, "fopen(%s)", argv[3]);+    /* Open patch file for control block */+    if ((cpf = fopen(argv[3], "rb")) == NULL)+        err(1, "fopen(%s)", argv[3]);+    /* open patch file for diff block */+    if ((dpf = fopen(argv[3], "rb")) == NULL)+        err(1, "fopen(%s)", argv[3]);+    /* open patch file for extra block */+    if ((epf = fopen(argv[3], "rb")) == NULL)+        err(1, "fopen(%s)", argv[3]);+    /* open oldfile */+    if ((oldfd = open(argv[1], O_RDONLY | O_BINARY, 0)) < 0)+        err(1, "open(%s)", argv[1]);+    /* open newfile */+    if ((newfd = open(argv[2], O_CREAT | O_TRUNC | O_WRONLY | O_BINARY,+       0666)) < 0)+        err(1, "open(%s)", argv[2]);@@ -123,22 +175,16 @@    /* Close patch file and re-open it via libbzip2 at the right places */    if (fclose(f))        err(1, "fclose(%s)", argv[3]);-    if ((cpf = fopen(argv[3], "rb")) == NULL)-        err(1, "fopen(%s)", argv[3]);    if (fseeko(cpf, 32, SEEK_SET))        err(1, "fseeko(%s, %lld)", argv[3],            (long long)32);    if ((cpfbz2 = BZ2_bzReadOpen(&cbz2err, cpf, 0, 0, NULL, 0)) == NULL)        errx(1, "BZ2_bzReadOpen, bz2err = %d", cbz2err);-    if ((dpf = fopen(argv[3], "rb")) == NULL)-        err(1, "fopen(%s)", argv[3]);    if (fseeko(dpf, 32 + bzctrllen, SEEK_SET))        err(1, "fseeko(%s, %lld)", argv[3],            (long long)(32 + bzctrllen));    if ((dpfbz2 = BZ2_bzReadOpen(&dbz2err, dpf, 0, 0, NULL, 0)) == NULL)        errx(1, "BZ2_bzReadOpen, bz2err = %d", dbz2err);-    if ((epf = fopen(argv[3], "rb")) == NULL)-        err(1, "fopen(%s)", argv[3]);    if (fseeko(epf, 32 + bzctrllen + bzdatalen, SEEK_SET))        err(1, "fseeko(%s, %lld)", argv[3],            (long long)(32 + bzctrllen + bzdatalen));@@ -145,9 +191,6 @@    if ((epfbz2 = BZ2_bzReadOpen(&ebz2err, epf, 0, 0, NULL, 0)) == NULL)        errx(1, "BZ2_bzReadOpen, bz2err = %d", ebz2err);-    oldfd = open(argv[1], O_RDONLY | O_BINARY, 0);-    if (oldfd < 0)-        err(1, "%s", argv[1]);    if ((oldsize = lseek(oldfd, 0, SEEK_END)) == -1 ||        (old = malloc(oldsize+1)) == NULL ||        lseek(oldfd, 0, SEEK_SET) != 0 ||@@ -215,9 +258,6 @@        err(1, "fclose(%s)", argv[3]);


    /* Write the new file */

-    newfd = open(argv[2], O_CREAT | O_TRUNC | O_WRONLY | O_BINARY, 0666);-    if (newfd < 0)-        err(1, "%s", argv[2]);    if (write(newfd, new, newsize) != newsize || close(newfd) == -1)

        err(1, "%s", argv[2]);

After the sequence of those open(2)s we can simply enter the capability mode. That’s all what you needed to do to sandbox it. We can also extend the sandbox to use the capability rights. This is done in the part below.

+    if (cap_enter() < 0) {+        /* Failed to sandbox, fatal if CAPABILITY_MODE enabled */+        if (errno != ENOSYS)+            err(1, "failed to enter security sandbox");+    } else {+        /* Capsicum Available */+        cap_rights_init(&rights_ro, CAP_READ, CAP_FSTAT, CAP_SEEK);+        cap_rights_init(&rights_wr, CAP_WRITE);+       +        if (cap_rights_limit(fileno(f), &rights_ro) < 0 ||+           cap_rights_limit(fileno(cpf), &rights_ro) < 0 ||+           cap_rights_limit(fileno(dpf), &rights_ro) < 0 ||+           cap_rights_limit(fileno(epf), &rights_ro) < 0 ||+           cap_rights_limit(oldfd, &rights_ro) < 0 ||+           cap_rights_limit(newfd, &rights_wr) < 0)+            err(1, "cap_rights_limit() failed, could not restrict"+               " capabilities");

+    }

All opened descriptors are limited to be able to read, stat the file, and seek over the file; the last descriptor is opened to do some writes. That's all we really needed to do a basic sandbox of bspatch(1). The hardest part is understanding the code rather than sandboxing it.

In this post we discussed the basic idea behind Capsicum and one of the simplest ways of sandboxing the application through separating it into two phases: the initialization and main phase. Sometimes the approach to separate the program into two phases is not enough and we need to do some extended privilege separation and compartmentalization, but let’s leave that for another blog post.