A proposed API for full-memory encryption


Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

Hardware memory encryption is, or will soon be, available on multiple generic CPUs. In its absence, data is stored — and passes between the memory chips and the processor — in the clear. Attackers may be able to access it by using hardware probes or by directly accessing the chips, which is especially problematic with persistent memory. One new memory-encryption offering is Intel's Multi-Key Total Memory Encryption (MKTME) [PDF]; AMD's equivalent is called Secure Encrypted Virtualization (SEV). The implementation of support for this feature is in progress for the Linux kernel. Recently, Alison Schofield proposed a user-space API for MKTME, provoking a long discussion on how memory encryption should be exposed to the user, if at all.

Filesystem encryption options offer a wide choice of possibilities; their use is now standard practice in a number of settings, protecting user data when it is at rest. On the other hand, data stored in main memory is kept in the clear, as are exchanges between memory chips and the processors. In a virtualized environment, if attackers can find a way to read memory from neighbor virtual machines, they can access the data from those machines. Physical attacks are possible by removing memory chips or spying on the memory buses. This is becoming a more serious threat with persistent-memory technologies, where the data stays in the clear even after power is removed. Memory-encryption technologies are aiming to address some of those attacks.

Memory encryption has been available in Intel chips for some time in the form of Total Memory Encryption (TME). It uses a single, CPU-generated key for all of memory; users can control the usage of TME in the boot-level firmware. A new standard, which will be available in upcoming chips, is MKTME, an extension of TME that supports different encryption settings (including disabling encryption) at the page level, and more keys. Different keys can be used at the same time for different memory regions. The main use case for MKTME seems to be adding more protection in systems with multiple virtual machines (see these slides from LinuxCon China [PDF]). The encryption algorithm supported is AES-XTS 128 with the physical address being taken into account as a type of nonce.

Lower-level support for MKTME in the Linux kernel was submitted in September 2018. Memory encryption was also one of the subjects discussed at the 2018 Linux Storage, Filesystem, and Memory-Management Summit. The recent patch set from Schofield goes further, adding the user interface to set up the encryption and (optionally) keys, assign key identifiers to memory regions; the patch set also adds a key store to support CPU hotplug.

encrypt_mprotect()

Setting up MKTME requires a few steps: create a key, map a region of anonymous memory, and enable the encryption. The key is created and added to the kernel keyring using the add_key() helper function from the keyutils library. It requires the key type and key material (if the key is not to be generated by the CPU) and additional, specific options. Then the user should map a region of anonymous memory with mmap(), then use a new system call to enable protection.

That new system call is encrypt_mprotect(). It takes the same parameters as mprotect() with addition of a key serial number. The prototype is:

 int encrypt_mprotect(unsigned long start, size_t len, unsigned long prot, key_serial_t serial);

An example showing the use of the new system call was submitted with the patch set.

API alternatives, key changes and cache state

Andy Lutomirski expressed a number of objections to the API in its proposed form. The first point was about the new system call, which he described as "an incomplete version of mprotect()" due to its lack of support for memory protection keys. Its only function is to change the encryption key while, he said, the most secure usage is to stick with the CPU-generated key.

He also had doubts about the safety of swapping encrypted memory. The kernel's direct-mapping area, which maps all of physical memory directly into the kernel's address space, can also be the source of cache-coherency issues. Problems could arise because the user's mapping and the kernel's direct mapping will have different keys for the same memory, so data corruption may occur. He doubted that MKTME should be used with anonymous memory (memory not backed up by a file or a device). As a solution, he proposed a different approach: instead of a generic API, there should be specific interfaces for persistent memory and virtual-machine hardening.

Dave Hansen responded, explaining the logic behind the API proposal. The goal of adding the new system call was to allow it to stack with the generic mprotect() and pkey_mprotect(), rather than replacing those other calls. The cache-coherency issues are expected to be avoided by careful reference counting in the VMAs before issuing the PCONFIG instruction that changes the key. He also promised to find out why the user-provided keys had been included.

Dan Williams pointed out that the persistent-memory code only needs to access the encrypted version of the data, so it never uses the direct mapping and can safely move blocks without considering the keys.

Further in the discussion Hansen noted that the persistent-memory use case, which requires a user-supplied key, is reasonable, but it is not covered by the current patch set; he proposed to postpone it until that part is done. Other developers, like Jarkko Sakkinen also asked questions, including about what happens if a key changes suddenly. The answer is that it might result in data corruption if the right cache flushes are missing. The discussion ended for now without a clear conclusion on either the API or the main use case for this feature.

User keys and CPU hotplug

The MKTME code tries not to save any key material longer than needed, so the kernel destroys user-supplied key data once the hardware has been programmed. That leads to a potential problem, though: the kernel will need those keys if a new CPU comes online. This problem was solved by setting up an optional storage for key data in kernel memory. When the mktme_savekeys kernel command line option is enabled, the code uses this store. Otherwise, new CPUs are not allowed if any user-supplied keys are in use.

The saving of encryption keys caused questions; Kai Huang asked if CPU-hotplug support is that important as storing the keys can make them susceptible to cold boot attacks. He noted that there are configurations where the kernel does not support CPU hotplug, and suggested that a per-socket key ID may be a solution. Kirill Shutemov didn't like the idea, as it would add complexity in the MKTME code that would need to keep track of nodes. It would also complicate memory management, especially in the case of memory migration. The solution has not been yet found; the next version of the patch will have to try to resolve the issue.

The security model for virtual-machine isolation

There have been multiple discussions around the security model of MKTME and how the feature is expected to be used, especially in comparison with TME. The developers concentrated on various exploits and malicious code that might try to override the protection.

Lutomirski noted that MKTME does not protect against malicious accesses between virtual machines, as the memory controller does not know where any given access comes from. Sakkinen agreed; he does not see TME making virtual-machine isolation any better. Hansen responded that MKTME does not provide protection when the attacking code can execute code inside the hypervisor. Also, when the kernel keeps non-encrypted mappings of memory that also has encrypted mappings, the attacker may be able to read the memory via the non-encrypted mappings. To avoid those problems, Lutomirski is proposing to reuse the exclusive page-frame ownership mechanism so that the direct mapping page is removed when memory is allocated for user space.

The discussion on the security model covered both virtual machines and CPUs. Interested readers may also refer to a a research paper [PDF] on SEV subversion.

Conclusions

The addition of MKTME support provoked a number of different opinions on how to support the feature. A consensus has not been reached yet and the final implementation may turn out to be different than what has been proposed so far. The discussion shows how difficult it is sometimes to create a good API. The main work the developers have to do now is to understand the use cases better and agree on an interface that will cover those needs. We are likely going to see more iterations of this patch and more discussion in the near future.

(Log in to post comments)