Fully Capable - The Ancient Sendmail Capabilities Issue


The bug, at its heart, was a misfire of the original filesystem-capability deficient implementation.

Working backwards from the observed bug: Sendmail with ([e]uid=0) did setuid(something other than 0) "expecting" (no check) it to return success, and by implication drop all remnant of the process having privilege.

With capabilities trying to implement setuid-0-ness, it was possible for an unprivileged user to block capable(CAP_SETUID) from sendmail and cause this and only this system call to fail. Since sendmail contained the (historically valid) assumption that this couldn't fail it blindly continued executing believing it had, indeed, dropped privilege when it actually hadn't even changed uid. Using this hole, an unprivileged user could gain root privilege...

If that had been the whole thing it might have truly been only a sendmail bug, one that the sendmail folk very promptly fixed, end of story. What caused us to change the kernel was the observation (1) that this kind of denial of syscall-service attack was so easy for a regular user to perform. A more subtle issue (2) also existed with the set*uid() (and other) system calls which added weight to wanting to "cripple" capabilities a bit before moving forward again.

To understand (1), we need to review the capability model. You can read about it in depth in the last draft of the POSIX.1e spec but, here is a quick summary:

Capabilities come in 3 flavors (Inheritable, Permitted, and Effective) and 2 sets: process capabilities and file capabilities. I'll abreviate them as: (pI,pP,pE) (fI,fP,fE).

A process can change its own capabilities directly using the capset() system call - but generally wrapped, as per POSIX.1e, in the API of libcap. The basic model is that a process can drop anything from any of its three p? capability sets at any time, but only add to its pI or pE set capabilities that are already present in pP.

Privileged operations are permitted, capable(CAP_FOO) != 0, if pE has the CAP_FOO capability raised. The exercise of privilege by capability aware applications can/should thus be carefully performed by an application when it raises pE bits only around critical sections of privilege needing code.

The capability rules for evolving state as processes evolve are: fork() lets each of the p? capabilities get duplicated exactly; and exec() convolutes the capabilities as follows:

  • pI' = pI (ie., unchanged by exec())

  • pP' = (X & fP) | (pI & fI)

  • pE' = fE & pP'

where p?' signifies the post-exec() capabilities owned by the process. X was not specified by the POSIX.1e doc - it was a detail left to the discretion of each implementation [X has subsequntly become cap_bset, but we will treat it as ~0 in this abbreviated history].

At a high level the pP' rule is where privilege comes from: the union of forced capabilities, fP, and optionally inheritable fI bits.

The kernel's initial implementation for capabilities, had no support for any of the f?s. Which meant that nothing could ever get pE' != 0. For most users executing non-privileged applications this was fine: they were supposed to have pP'=pE'=0! However, legacy setuid-0 and processes run by root needed a mechanism to obtain pE' != 0.

To emulate the superuser, the fateful decision (mine) was that the kernel gave all users pI=~0 (that is the potential to gain all privilege). Ignoring the special case of init - which had its capabilities specified in the kernel, the superuser concept could then be mapped by the following rules:

if (uid==0 or file-to-exec-is-setuid0) {

fE = fI = ~0;

fP = 0;

} else {

fI = fP = fE = 0;

}

If you follow the rules above you will see that this gives pE'=pP'=~0 to privileged programs. What this didn't do was offer any way for non-uid=0 processes to gain pE bits - they could never be capable() of anything. Non-root users getting capabilities is a key feature of the POSIX.1e model, but there was no need to hack in such a feature because we had an active effort underway to implement filesystem capabilities leveraging Andreas Grünbacher's extended attributes [and in my kernel sandbox filesystem capabilities worked fine: http://www.kernel.org/pub/linux/libs/security/linux-privs/old/ *-fcap but that is getting off-track].

Getting back to (1), the problem with this simple model was that as per POSIX.1e capset() allowed any unprivileged (pE=0) user to drop any pI bit, and that meant that any subsequently invoked setuid-0 program could be forced to only get access to a subset of all of the capabilities: pE' = pP' = (pI & fI)... This is how an unprivileged user could invoke setuid-0 sendmail and have its privilege dropping setuid() call fail.

One could argue that clearly every legacy privileged program had the same potential problem and it was judged, I particularly recall Ted Tso being a strong advocate, that we avoid this ASAP..! It turned out that with a simple change - a few lines of kernel code - one could eradicate this issue and this was the change that "crippled capabilities".

The change was as follows. Init would give all processes pI=0 (that is *no* inheritable potential for gaining any privilege, and nothing for an unprivileged user to drop). And the rules for uid-0 exec() mapping would be:

if (uid==0 or file-to-exec-is-setuid0) {

fE = fP = ~0;

fI = 0;

} else {

fI = fP = fE = 0;

}

If you follow the rules above you will see that this gives pE'=pP'=~0 to privileged programs just like the old mapping, but has no mechanism for an unprivileged user to influence it.

The more subtle issue (2) was that setuid(), and all of its very many variants, have very strange semantics with respect to capabilities. Nominally, for transitions not involving *uid=0, this class of system call simply changes the *uid of the current process or fails to. Its return value is a clear indication of whether the transition succeeded or not. If a transition involves *uid=0 then all sorts of subtle and hard to follow other things also occur...

With respect to capabilities, this system call can only return one status - success or not. For a process where one is capable(CAP_SETUID) then how does the calling application differentiate between 'just changed uid' and 'just changed uid, and dropped all privilege'? In a pure capability mode you want the former and in a legacy mode you want the latter. But in neither mode is any status value from the system call available to tell you that you got what you wanted. By suppressing the sendmail class of bugs we put off this subtle issue for another day... And I was still happy because filesystem capabilities worked fine in my sandbox etc. . Hopefully that explains what 'the sendmail bug' was.