I'm attending the Linux Plumbers Conference in Lisbon from Monday to Wednesday this week. This morning I followed the "Distribution kernels" track, organised by Laura Abbott.
I took notes, included below, mostly with a view to what could be relevant to Debian. Other people took notes in Etherpad. There should also be video recordings available at some point.
Upstream 1st: Tools and workflows for multi kernel version juggling of short term fixes, long term support, board enablement and features with the upstream kernel
Speaker: Bruce Ashfield, working on Yocto at Xilinx.
Yocto's kernel build recipes need to support multiple active kernel versions (3+ supported streams), multiple architectures, and many different boards. Many patches are required for hardware and other feature support including -rt and aufs.
Goals for maintenance:
- Changes w.r.t. upstream are visible as discrete patches, so rebased rather than merged
- Common feature set and configuration
- Different feature enablements
- Use as few custom tools as possible
Other distributions have similar goals but very few tools in common. So there is a lot of duplicated effort.
Supporting developers, distro builds and end users is challenging. E.g. developers complained about Yocto having separate git repos for different kernel versions, as this led to them needing more disk space.
- Config fragments, patch tracking repo, generated tree(s)
- Branched repository with all patches applied
- Custom change management tools
Using Yocto to build a distro and maintain a kernel tree
Speaker: Senthil Rajaram & Anatoly ? from Microsoft.
Microsoft chose Yocto as build tool for maintaining Linux distros for different internal customers. Wanted to use a single kernel branch for different products but it was difficult to support all hardware this way.
Maintaining config fragments and sensible inheritance tree is difficult (?). It might be helpful to put config fragments upstream.
Laura Abbott said that the upstream kconfig system had some support for fragments now, and asked what sort of config fragments would be useful. There seemed to be consensus on adding fragments for specific applications and use cases like "what Docker needs".
Kernel build should be decoupled from image build, to reduce unnecessary rebuilding.
Initramfs is unpacked from cpio, which doesn't support SELinux. So they build an initramfs into the kernel, and add a separate initramfs containing a squashfs image which the initramfs code will switch to.
Making it easier for distros to package kernel source
Speaker: Don Zickus, working on RHEL at Red Hat.
- Makefile includes Makefile.distro
- Other distro stuff goes under distro sub-directory (merge or copy)
- Add targets like fedora-configs, fedora-srpm
Lots of discussion about whether config can be shared upstream, but no agreement on that.
Kyle McMartin(?): Everyone does the hierarchical config layout - like generic, x86, x86-64 - can we at least put this upstream?
Monitoring and Stabilizing the In-Kernel ABI
Speaker: Matthias Männich, working on Android kernel at Google.
Why does Android need it?
- Decouple kernel vs module development
- Provide single ABI/API for vendor modules
- Reduce fragmentation (multiple kernel versions for same Android version; one kernel per device)
Project Treble made most of Android user-space independent of device. Now they want to make the kernel and in-tree modules independent too. For each kernel version and architecture there should be a single ABI. Currently they accept one ABI bump per year. Requires single kernel configuration and toolchain. (Vendors would still be allowed to change configuration so long as it didn't change ABI - presumably to enable additional drivers.)
ABI stability is scoped - i.e. they include/exclude which symbols need to be stable.
ABI is compared using libabigail, not genksyms. (Looks like they were using it for libraries already, so now using it for kernel too.)
Q: How we can ignore compatible struct extensions with libabigail?
A: (from Dodji Seketeli, main author) You can add specific "suppressions" for such additions.
KernelCI applied to distributions
Speaker: Guillaume Tucker from Collabora.
Can KernelCI be used to build distro kernels?
KernelCI currently builds arbitrary branch with in-tree defconfig or small config fragment.
- Preparation steps to apply patches, generate config
- Package result
- Track OS image version that kernel should be installed in
Some in audience questioned whether building a package was necessary.
Possible further improvements:
- Enable testing based on user-space changes
- Product-oriented features, like running installer
Should KernelCI be used to build distro kernels?
Seems like a pretty close match. Adding support for different use-cases is healthy for KernelCI project. It will help distro kernels stay close to upstream, and distro vendors will then want to contribute to KernelCI.
Someone pointed out that this is not only useful for distributions. Distro kernels are sometimes used in embedded systems, and the system builders also want to check for regressions on their specific hardware.
Q: (from Takashi Iwai) How long does testing typically takes? SUSE's full automated tests take ~1 week.
A: A few hours to build, depending on system load, and up to 12 hours to complete boot tests.
Automatically testing distribution kernel packages
Speaker: Alice Ferrazzi of Gentoo.
Gentoo wants to provide safe, tested kernel packages. Currently testing gentoo-sources and derived packages. gentoo-sources combines upstream kernel source and "genpatches", which contains patches for bug fixes and target-specific features.
Testing multiple kernel configurations - allyesconfig, defconfig, other reasonable configurations. Building with different toolchains.
Tests are implemented using buildbot. Kernel is installed on top of a Gentoo image and then booted in QEMU.
Generalising for discussion:
- Jenkins vs buildbot vs other
- Beyond boot testing, like LTP and kselftest
- LAVA integration
- Supporting other configurations
- Any other Gentoo or meta-distro topic
Don Zickus talked briefly about Red Hat's experience. They eventually settled on Gitlab CI for RHEL.
Some discussion of what test suites to run, and whether they are reliable. Varying opinions on LTP.
There is some useful scripting for different test suites at https://github.com/linaro/test-definitions.
Tim Bird talked about his experience testing with Fuego. A lot of the test definitions there aren't reusable. kselftest currently is hard to integrate because tests are supposed to follow TAP13 protocol for reporting but not all of them do!
Distros and Syzkaller - Why bother?
Speaker: George Kennedy, working on virtualisation at Oracle.
Which distros are using syzkaller? Apparently Google uses it for Android, ChromeOS, and internal kernels.
Oracle is using syzkaller as part of CI for Oracle Linux. "syz-manager" schedules jobs on dedicated servers. There is a cron job that automatically creates bug reports based on crashes triggered by syzkaller.
Google's syzbot currently runs syzkaller on GCE. Planning to also run on QEMU with a wider range of emulated devices.
How to make syzkaller part of distro release process? Need to rebuild the distro kernel with config changes to make syzkaller work better (KASAN, KCOV, etc.) and then install kernel in test VM image.
How to correlate crashes detected on distro kernel with those known and fixed upstream?
Example of benefit: syzkaller found regression in rds_sendmsg, fixed upstream and backported into the distro, but then regressed in Oracle Linux. It turned out that patches to upgrade rds had undone the fix.
syzkaller can generate test cases that fail to build on old kernel versions due to symbols missing from UAPI headers. How to avoid this?
Q: How often does this catch bugs in the distro kernel?
A: It doesn't often catch new bugs but does catch missing fixes and regressions.
Q: Is anyone checking the syzkaller test cases against backported fixes?
A: Yes [but it wasn't clear who or when]
Google has public database of reproducers for all the crashes found by syzbot.
- Syzkaller repo tag indicating which version is suitable for a given kernel version's UAPI
- tarball of syzbot reproducers
Other possible types of fuzzing (mostly concentrated on KVM):
- They fuzz MSRs, control & debug regs with "nano-VM"
- Missing QEMU and PCI fuzzing
- Intel and AMD virtualisation work differently, and AMD may not be covered well
- Missing support for other architectures than x86