~ Updated 2019-01-10 ~ BPF, as in Berkeley Packet Filter, was initially conceived in 1992
so as to provide a way to filter packets and to avoid useless packet copies
from kernel to userspace. It initially consisted in a simple bytecode that is
injected from userspace into the kernel, where it is checked by a verifier—to
prevent kernel crashes or security issues—and attached to a socket, then run on
each received packet. It was ported to Linux a couple of years later, and used
for a small number of applications (tcpdump for example). The simplicity of the
language as well as the existence of an in-kernel Just-In-Time (JIT) compiling
machine for BPF were factors for the excellent performances of this tool. Then in 2013, Alexei Starovoitov completely reshaped it, started to add new
functionalities and to improve the performances of BPF. This new version is
designated as eBPF (for “extended BPF”), while the former becomes cBPF
(“classic” BPF). New features such as maps and tail calls appeared. The JIT
machines were rewritten. The new language is even closer to native machine
language than cBPF was. And also, new attach points in the kernel have been
created. Thanks to those new hooks, eBPF programs can be designed for a variety of use
cases, that divide into two fields of applications. One of them is the domain
of kernel tracing and event monitoring. BPF programs can be attached to kprobes
and they compare with other tracing methods, with many advantages (and
sometimes some drawbacks). The other application domain remains network programming. In addition to socket
filter, eBPF programs can be attached to tc (Linux traffic control tool)
ingress or egress interfaces and perform a variety of packet processing tasks,
in an efficient way. This opens new perspectives in the domain. And eBPF performances are further leveraged through the technologies developed
for the IO Visor project: new hooks have also been added for XDP (“eXpress Data
Path”), a new fast path recently added to the kernel. XDP works in conjunction
with the Linux stack, and relies on BPF to perform very fast packet processing. Even some projects such as P4, Open vSwitch,
consider
or started to approach BPF. Some others, such as CETH, Cilium, are entirely
based on it. BPF is buzzing, so we can expect a lot of tools and projects to
orbit around it soon… As for me: some of my work (including for
BEBA)
is closely related to eBPF, and several future articles on this site will focus
on this topic. Logically, I wanted to somehow introduce BPF on this blog before
going down to the details—I mean, a real introduction, more developed on BPF
functionalities that the brief abstract provided in first section: What are BPF
maps? Tail calls? What do the internals look like? And so on. But there are a
lot of presentations on this topic available on the web already, and I do not
wish to create “yet another BPF introduction” that would come as a duplicate of
existing documents. So instead, here is what we will do. After all, I spent some time reading and
learning about BPF, and while doing so, I gathered a fair amount of material
about BPF: introductions, documentation, but also tutorials or examples. There
is a lot to read, but in order to read it, one has to find it first.
Therefore, as an attempt to help people who wish to learn and use BPF, the
present article introduces a list of resources. These are various kinds of
readings, that hopefully will help you dive into the mechanics of this kernel
bytecode. The documents linked below provide a generic overview of BPF, or of some
closely related topics. If you are very new to BPF, you can try picking a
couple of presentation among the first ones and reading the ones you like most.
If you know eBPF already, you probably want to target specific topics instead,
lower down in the list. Generic presentations about eBPF: A brief introduction to XDP and eBPF
(Diego Pino García, January 2019): Introduction to eBPF in Red Hat Enterprise Linux 7
(Stanislav Kozina, January 2019): Toward Flexible and Efficient In-Kernel Network Function Chaining with IO Visor
(Fulvio Risso, HPSR 2018, Bucharest, June 2018): A thorough introduction to eBPF
(Matt Flemming, on LWN.net, December 2017): Making the Kernel’s Networking Data Path Programmable with BPF and XDP
(Daniel Borkmann, OSSNA17, Los Angeles, September 2017): The BSD Packet Filter
(Suchakra Sharma, June 2017): BPF: tracing and more
(Brendan Gregg, January 2017): Linux BPF Superpowers
(Brendan Gregg, March 2016): IO Visor
(Brenden Blanco, SCaLE 14x, January 2016): eBPF on the Mainframe
(Michael Holzheu, LinuxCon, Dublin, October 2015) New (and Exciting!) Developments in Linux Tracing
(Elena Zannoni, LinuxCon, Japan, 2015) BPF — in-kernel virtual machine
(Alexei Starovoitov, February 2015): Extending extended BPF
(Jonathan Corbet, July 2014) BPF internals: These presentations are probably one of the best sources of documentation to
understand the design and implementation of internal mechanisms of eBPF. The IO Visor blog has some
interesting technical articles about BPF. Some of them contain a bit of
marketing talks. As of early 2019, there are more and more presentations being done around
multiple aspects of BPF. One nice example is
the BPF track that was held in parallel
to the Linux Plumbers Conference in late 2018 (and should be held again on
coming years), where lots of topics related to eBPF development or use cases
were presented. Kernel tracing: summing up all existing methods, including BPF: Meet-cute between eBPF and Kerne Tracing
(Viller Hsiao, July 2016): Linux Kernel Tracing
(Viller Hsiao, July 2016): Regarding event tracing and monitoring, Brendan Gregg uses eBPF a lot and
does an excellent job at documenting some of his use cases. If you are in
kernel tracing, you should see his blog articles related to eBPF or to flame
graphs. Most of it are accessible
from this article
or by browsing his blog. Introducing BPF, but also presenting generic concepts of Linux networking: Hardware offload: About cBPF: The eXpress Data Path
(Diego Pino García, January 2019): XDP overview on the IO Visor
website. eXpress Data Path (XDP)
(Tom Herbert, Alexei Starovoitov, March 2016): BoF - What Can BPF Do For You?
(Brenden Blanco, LinuxCon, Toronto, August 2016). (Tests performed with the mlx4 driver). (Jesper also created and tries to extend some documentation about eBPF and
XDP, see related section.) XDP workshop — Introduction, experience, and future development
(Tom Herbert, netdev 1.2, Tokyo, October 2016) — as of this writing, only the
video is available, I don’t know if the slides will be added. High Speed Packet Filtering on Linux
(Gilberto Bertin, DEF CON 25, Las Vegas, July 2017) — an excellent
introduction to state-of-the-art packet filtering on Linux, oriented towards
DDoS protection, talking about packet processing in the kernel, kernel
bypass, XDP and eBPF. AF_XDP is a new Linux socket type using eBPF filters to drive packets
to user space at really high speed. Some of it is already in the kernel.
There are a couple of presentations about the mechanism, such as
Fast Packet Processing in Linux with AF_XDP
(Björn Töpel and Magnus Karlsson, FOSDEM 2018, Brussels, February 2018). A full-length article describing the details of XDP is available, dating from
December 2018. It is called
The eXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel
and was written by Toke Høiland-Jørgensen, Jesper Dangaard Brouer, Daniel
Borkmann, John Fastabend, Tom Herbert, David Ahern and David Miller, all
being essential eBPF and XDP contributors. bpfilter is a new Linux mechanism trying to leverage eBPF programs to
offer a replacement for netfilter, while remaining compatible with the
iptables user utility.
Here is a high-level post
by Thomas Graf about the motivations behind this project, and
there is my own presentation
on the topic. Are you wondering why your fresh Linux install has BPF programs running,
although you do not remember attaching any? Starting with version 235 (think
Ubuntu 18.04),
systemd itself
uses BPF programs, in particular for IP traffic accounting and access
control. P4 on the Edge
(John Fastabend, May 2016): If you like audio presentations, there is an associated
OvS Orbit episode (#11), called P4 on the Edge,
dating from August 2016. OvS Orbit are interviews realized by Ben Pfaff, who
is one of the core maintainers of Open vSwitch. In this case, John Fastabend
is interviewed. P4, EBPF and Linux TC Offload
(Dinan Gunawardena and Jakub Kicinski, August 2016): A good deal of contents is repeated between the different presentations; if
in doubt, just pick the most recent one. Daniel Borkmann has also written
a generic introduction to Cilium
as a guest author on Google Open Source blog. There are also podcasts about Cilium: an
OvS Orbit episode (#4),
in which Ben Pfaff interviews Thomas Graf (May 2016), and
another podcast by Ivan Pepelnjak,
still with Thomas Graf about eBPF, P4, XDP and Cilium (October 2016). These use cases for eBPF seem to be only at the stage of proposals (nothing
merge to OvS main branch) as far as I know, but it will be very interesting
to see what comes out of it. XDP is envisioned to be of great help for protection against Distributed
Denial-of-Service (DDoS) attacks. More and more presentations focus on this.
For example, the talks from people from Cloudflare
(XDP in practice: integrating XDP in our DDoS mitigation pipeline)
or from Facebook
(Droplet: DDoS countermeasures powered by BPF + XDP)
at the netdev 2.1 conference in Montreal, Canada, in April 2017, present such
use cases. Katran is an open source layer four (L4) load-balancer built by Facebook
on top of XDP. There is a presentation
in this post,
and the code is available
on GitHub. Kubernetes can interact in a number of ways with eBPF. There is and
interesting article about Using eBPF in
Kubernetes
that explains how existing products (Cilium, Weave Scope) leverage eBPF to
work with Kubernetes, or more generically describing what interactions with
eBPF are interesting in the context of container deployment. CETH for XDP
(Yan Chan and Yunsong Lu, Linux Meetup, Santa Clara, July 2016): The VALE switch, another virtual
switch that can be used in conjunction with the netmap framework, has a BPF
extension module. The project claims to attain excellent performances when using driver-native
XDP. InKeV: In-Kernel Distributed Network Virtualization for DCN
(Z. Ahmed, M. H. Alizai and A. A. Syed, SIGCOMM, August 2016): gobpf - utilizing eBPF from Go
(Michael Schubert, fosdem17, Brussels, Belgium, February 2017): ply is a small but flexible open source
dynamic tracer for Linux, with some features similar to the bcc tools,
but with a simpler language inspired by awk and DTrace, written by Tobias
Waldekranz. BPFtrace is also a tool for
tracing, again with its own DSL. It is flexible enough to be envisioned as a
Linux replacement for DTrace and SystemTap. It was created by Alastair
Robertson and Brendan Gregg. BPFd
is a project trying to leverage the flexibility of the bcc tools to trace and
debug remote targets, and in particular devices running with Android.
adeb is related, and provides a
Linux shell environment for that purpose. It is not to be confused with
bpfd, small letters, which claims
to be a container-aware framework for running BPF tracers with rules on Linux
as a daemon. Could DPDK one day work in concert with BPF? It looks likely that the
AF_XDP mechanism introduced in the kernel will be used to drive packets to
user space and to feed them to applications using the framework. However,
there were also some
discussions for replicating the eBPF interpreter and JIT compiler in DPDK itself.
They did not seem to lead to the inclusion on the feature at this time. Even if it does not make it to the core of DPDK, eBPF, and in particular
AF_XDP, using XDP programs to redirect packets to user space sockets, can be
used to create
a poll-mode driver (PMD) for DPDK. Sysdig, a tool for universal system
visibility with native support for containers, now supports eBPF
as an instrumentation back end. The user file system FUSE is also considering using eBPF for improved
performance. This was the topic of
a presentation at the Linux Foundation Open Source Summit 2017,
and a related page on the ExtFUSE project is
available. In order to help with measuring power consumption for servers, the
DEEP-mon
tool is using eBPF programs for in-kernel aggregation of data. Once you managed to get a broad idea of what BPF is, you can put aside generic
presentations and start diving into the documentation. Below are the most
complete documents about BPF specifications and functioning. Pick the one you
need and read them carefully! The specification of BPF (both classic and extended versions) can be
found within the documentation of the Linux kernel, and in particular in file
linux/Documentation/networking/filter.txt.
The use of BPF as well as its internals are documented there. Also, this is
where you can find information about errors thrown by the verifier when
loading BPF code fails. Can be helpful to troubleshoot obscure error
messages. Also in the kernel tree, there is a document about frequent Questions &
Answers on eBPF design in file
linux/Documentation/bpf/bpf_design_QA.rst. … But the kernel documentation is dense and not especially easy to read. If
you look for a simple description of eBPF language, head for
its summarized description
on the IO Visor GitHub repository instead. By the way, the IO Visor project gathered a lot of resources about BPF.
Mostly, it is split between
the documentation directory
of its bcc repository, and the whole content of
the bpf-docs repository, both on
GitHub. Note the existence of this excellent
BPF reference guide
containing a detailed description of BPF C and bcc Python helpers. To hack with BPF, there are some essential Linux manual pages. The first
one is
the Jesper Dangaard Brouer initiated an attempt to update eBPF Linux
documentation, including the different kinds of maps.
He has a draft
to which contributions are welcome. Once ready, this document should be
merged into the man pages and into kernel documentation. The Cilium project also has an excellent BPF and XDP Reference
Guide, written by core eBPF
developers, that should prove immensely useful to any eBPF developer. The last one is possibly the best existing summary about the verifier at this
date. Ferris Ellis started
a blog post series about eBPF.
As I write this paragraph, the first article is out, with some historical
background and future expectations for eBPF. Next posts should be more
technical, and look promising. When using BPF for networking purposes in conjunction with tc, the Linux tool
for traffic control, one may wish to gather information about tc’s
generic functioning. Here are a couple of resources about it. It is difficult to find simple tutorials about QoS on Linux. The two
links I have are long and quite dense, but if you can find the time to read
it you will learn nearly everything there is to know about tc (nothing about
BPF, though). There they are:
Traffic Control HOWTO (Martin A. Brown, 2006),
and the
Linux Advanced Routing & Traffic Control HOWTO (“LARTC”) (Bert Hubert & al., 2002). tc manual pages may not be up-to-date on your system, since several of
them have been added lately. If you cannot find the documentation for a
particular queuing discipline (qdisc), class or filter, it may be worth
checking the latest
manual pages for tc components. Some additional material can be found within the files of iproute2 package
itself: the package contains some documentation,
including some files that helped me understand better
the functioning of tc’s actions. Not exactly documentation: there was
a workshop about several tc features
(including filtering, BPF, tc offload, …) organized by Jamal Hadi Salim
during the netdev 1.2 conference (October 2016). Bonus information—If you use P4 is a language used to specify the behavior of a switch. It
can be compiled for a number of hardware or software targets. As you may have
guessed, one of these targets is BPF… The support is only partial: some P4
features cannot be translated towards BPF, and in a similar way there are
things that BPF can do but that would not be possible to express with P4.
Anyway, the documentation related to P4 use with BPF
used to be hidden in bcc repository.
This changed with P4_16 version, the p4c reference compiler including
a backend for eBPF. There is also an interesting presentation from Jamal Hadi Salim, presenting a
number of points from tc from which P4 could maybe get some inspiration:
What P4 Can Learn From Linux Traffic Control Architecture. Brendan Gregg has initiated excellent tutorials intended for people who want
to use bcc tools for tracing and monitoring events in the kernel.
The first tutorial about using bcc itself
comes with many steps to understand how to use the existing tools, while
the one intended for Python developers
focuses on developing new tools, across seventeen “lessons”. Lorenza Fontana has made a tutorial to explain how to
Load XDP programs using the ip (iproute2) command. If you are unfamiliar to kernel compiling, Diego Pino García has a blog entry
on
How to build a kernel with [AF-]XDP support. Sasha Goldshtein also has some
Linux Tracing Workshops Materials
involving the use of several BPF tools for tracing. Another post by Jean-Tiare Le Bigot provides a detailed (and instructive!)
example of
using perf and eBPF to setup a low-level tracer
for ping requests and replies. Few tutorials exist for network-related eBPF use cases. There are some
interesting documents, including an eBPF Offload Starting Guide, on the
Open NFP platform
operated by Netronome. Other than these, the talks from Jesper and Andy,
XDP for the Rest of Us
(and
its second edition),
are probably one of the best ways to get started with XDP. If you really focus on hardware offload for eBPF, Netronome (my employer as I
edit this text) is the only vendor to propose it at the moment. Besides their
Open-NFP platform, the best source of information is their support platform:
https://help.netronome.com. You will find there video tutorials from David
Beckett explaining how to run and offload XDP programs, user guides, and other
materials… including the firmware for the Agilio SmartNICs required to perform
eBPF offload! It is always nice to have examples. To see how things really work. But BPF
program samples are scattered across several projects, so I listed all the ones
I know of. The examples do not always use the same helpers (for instance, tc
and bcc both have their own set of helpers to make it easier to write BPF
programs in C language). The kernel contains examples for most types of program: filters to bind to
sockets or to tc interfaces, event tracing/monitoring, and even XDP. You can
find these examples under the
linux/samples/bpf/
directory. Nowadays, most examples are added under
linux/tools/testing/selftests/bpf
as unit tests. This includes tests for hardware offload or for libbpf. Some additional tests regarding BPF with tc can be found in the kernel suite of
tests for tc itself, under
linux/tools/testing/selftests/tc-tests. Jesper Dangaard Brouer also maintains a specific set of samples in his
prototype-kernel
repository. They are very similar to those from the kernel, but can be compiled
outside of the kernel infrastructure (Makefiles and headers). Also do not forget to have a look to the logs related to the (git) commits that
introduced a particular feature, they may contain some detailed example of the
feature. The iproute2 package provide several examples as well. They are obviously
oriented towards network programming, since the programs are to be attached to
tc ingress or egress interfaces. The examples dwell under the
iproute2/examples/bpf/
directory. Many examples are provided with bcc: Some are networking example programs, under the associated directory. They
include socket filters, tc filters, and a XDP program. The There are also some examples using Lua as a different BPF back-end (that
is, BPF programs are written with Lua instead of a subset of C, allowing to
use the same language for front-end and back-end), in the third directory. Of course, bcc tools
themselves are interesting example use cases for eBPF programs. Some other BPF programs are emerging here and there. Have a look at the
different projects based on or using eBPF, mentioned above, and search their
code to find how they inject programs into the kernel. Netronome also has
a GitHub repository with some samples XDP demo applications,
some of them for hardware offload only, others for both driver and offloaded
XDP. While bcc is generally the easiest way to inject and run a BPF program in the
kernel, attaching programs to tc interfaces can also be performed by the Sometimes, BPF documentation or examples are not enough, and you may have no
other solution that to display the code in your favorite text editor (which
should be Vim of course) and to read it. Or you may want to hack into the code
so as to patch or add features to the machine. So here are a few pointers to
the relevant files, finding the functions you want is up to you! The file
linux/include/linux/bpf.h
and its counterpart
linux/include/uapi/bpf.h
contain definitions related to eBPF, to be used respectively in the
kernel and to interface with userspace programs. On the same pattern, files
linux/include/linux/filter.h
and
linux/include/uapi/filter.h
contain information used to run the BPF programs. The main pieces of code related to BPF are under
linux/kernel/bpf/
directory. The different operations permitted by the system call, such as
program loading or map management, are implemented in file Several functions as well as the helpers related to networking (with
tc, XDP…) and available to the user, are implemented in
linux/net/core/filter.c.
It also contains the code to migrate cBPF bytecode to eBPF (since all cBPF
programs are now translated to eBPF in the kernel before being run). Function and helpers related to event tracing are in
linux/kernel/trace/bpf_trace.c
instead. The JIT compilers are under the directory of their respective
architectures, such as file
linux/arch/x86/net/bpf_jit_comp.c
for x86. Exception is made for JIT compilers used for hardware offload, they
sit in their driver, see for instance
linux/drivers/net/ethernet/netronome/nfp/bpf/jit.c
for Netronome NFP cards. You will find the code related to the BPF components of tc in the
linux/net/sched/
directory, and in particular in files I have not used seccomp-BPF much, but you should find the code in
linux/kernel/seccomp.c,
and some example use cases can be found in
linux/tools/testing/selftests/seccomp/seccomp_bpf.c. Once loaded into the in-kernel BPF virtual machine, XDP programs are hooked
from userspace into the kernel network path thanks to a Netlink command. On
reception, the function One can find the code for the bcc set of tools
on the bcc GitHub repository.
The Python code, including the The code related to BPF in tc comes with the iproute2 package, of course.
Some of it is under the
iproute2/tc/
directory. The files f_bpf.c and m_bpf.c (and e_bpf.c) are used respectively
to handle BPF filters and actions (and tc The kernel also ships the sources of three tools ( Read the comments at the top of the source files to get an overview of their
usage. Other essential files to work with eBPF are the two userspace libraries
from the kernel tree, that can be used to manage eBPF programs or maps from
external programs. The functions are accessible through headers If you are interested the use of less common languages with BPF, bcc contains
a P4 compiler for BPF targets
as well as
a Lua front-end that
can be used as alternatives to the C subset and (in the case of Lua) to the
Python tools. The BPF backend used by clang / LLVM for compiling C into eBPF was added to the
LLVM sources in
this commit
(and can also be accessed on
the GitHub mirror). As far as I know there are at least two eBPF userspace implementations. The
first one, uBPF, is written in C. It
contains an interpreter, a JIT compiler for x86_64 architecture, an assembler
and a disassembler. The code of uBPF seems to have been reused to produce a
generic implementation,
that claims to support FreeBSD kernel, FreeBSD userspace, Linux kernel,
Linux userspace and MacOSX userspace. It is used for the BPF extension module
for VALE switch. The other userspace implementation is my own work:
rbpf, based
on uBPF, but written in Rust. The interpreter and JIT-compiler work (both under
Linux, only the interpreter for MacOSX and Windows), there may
be more in the future. As stated earlier, do not hesitate to have a look at the commit log that
introduced a particular BPF feature if you want to have more information about
it. You can search the logs in many places, such as on
git.kernel.org,
on GitHub, or on your local
repository if you have cloned it. If you are not familiar with git, try things
like The enthusiasm about eBPF is quite recent, and so far I have not found a lot of
resources intending to help with troubleshooting. So here are the few I have,
augmented with my own recollection of pitfalls encountered while working with
BPF. Make sure you have a recent enough version of the Linux kernel (see also
this document). If you compiled the kernel yourself: make sure you installed correctly all
components, including kernel image, headers and libc. When using the (seems fixed as of today). For other problems with If you downloaded the examples from the iproute2 package in a version that
does not exactly match your kernel, some errors can be triggered by the
headers included in the files. The example snippets indeed assume that the
same version of iproute2 package and kernel headers are installed on the
system. If this is not the case, download the correct version of iproute2, or
edit the path of included files in the examples to point to the headers
included in iproute2 (some problems may or may not occur at runtime,
depending on the features in use). To load a program with tc, make sure you use a tc binary coming from an
iproute2 version equivalent to the kernel in use. To load a program with bcc, make sure you have bcc installed on the system
(just downloading the sources to run the Python script is not enough). With tc, if the BPF program does not return the expected values, check that
you called it in the correct fashion: filter, or action, or filter with
“direct-action” mode. With tc still, note that actions cannot be attached directly to qdiscs or
interfaces without the use of a filter. The errors thrown by the in-kernel verifier may be hard to interpret. The
kernel documentation may help, so may the reference
guide or, as a last resort, the source code (see above) (good
luck!). For this kind of errors it is also important to keep in mind that the
verifier does not run the program. If you get an error about an invalid
memory access or about uninitialized data, it does not mean that these
problems actually occurred (or sometimes, that they can possibly occur at
all). It means that your program is written in such a way that the verifier
estimates that such errors could happen, and therefore it rejects the
program. Note that bcc also has verbose options: the LLVM v4.0+
embeds a disassembler
for eBPF programs. So if you compile your program with clang, adding the Working with maps? You want to have a look at
bpf-map, a very userful tool in Go
created for the Cilium project, that can be used to dump the contents of
kernel eBPF maps. There also exists
a clone in Rust. There is an old
And come back on this blog from time to time to see if they are
new articles about BPF! Special thanks to Daniel Borkmann for the numerous
additional documents
he pointed to me so that I could complete this collection.What is BPF?
Dive into the bytecode
Resources
About BPF
An excellent and accessible introduction providing context, history, and
details about the functioning of eBPF.
Focusing on the eBPF features arriving in Red Hat.
A generic introduction to BPF, XDP, IO Visor, bcc and other components.
A well-written and accessible introduction providing an overview of eBPF
subsystem components.
One of the best set of slides available to understand quickly all the basics about eBPF and XDP (mostly for network processing).
A very nice introduction, mostly about the tracing aspects.
Mostly about the tracing use cases.
With a first part on the use of flame graphs.
Also introduces IO Visor project.
Presentation by the author of eBPF.
Daniel provides details on eBPF, its use for tunneling and encapsulation,
direct packet access, and other features.
After introducing eBPF, this presentation provides insights on many
internal BPF mechanisms (map management, tail calls, verifier). A
must-read! For the most ambitious,
the full paper is available here.
Kprobes, uprobes, ftrace
Systemtap, Kernelshark, trace-cmd, LTTng, perf-tool, ftrace, hist-trigger,
perf, function tracer, tracepoint, kprobe/uprobe…
eBPF/XDP hardware offload to SmartNICs
(Jakub Kicinski and Nic Viljoen, netdev 1.2, Tokyo, October 2016)
Comprehensive XDP offload—Handling the edge cases
(Jakub Kicinski and Nic Viljoen, netdev 2.2, Seoul, November 2017)
The Challenges of XDP Hardware Offload
(Quentin Monnet, FOSDEM’18, Brussels, February 2018)About XDP
Probably one of the most accessible introduction to XDP, providing sample
code to show how one can easily process packets.
The first presentation about XDP.
Contains some (somewhat marketing?) benchmark results! With a single core:
“Linux Kernel’s fight against DPDK”. Future plans (as of this
writing) for XDP and comparison with DPDK.
Additional hints about XDP internals and expected evolution.
Contains details and use cases about XDP, with benchmark results, and
code snippets for benchmarking as well as for basic DDoS
protection with eBPF/XDP (based on an IP blacklisting scheme).
Provides a lot of details about current memory issues faced by XDP
developers. Do not start with this one, but if you already know XDP and
want to see how it really works on the page allocation side, this is a very
helpful resource.
How to get started with eBPF and XDP for normal humans. This presentation
was also summarized by Julia Evans on
her blog.
Revised version of the talk, with new contents.
Update on XDP, and in particular on the redirect actions (redirecting
packets to other interfaces or other CPUs, with or without the use of eBPF
maps for better performance).
Presents the use of P4, a description language for packet processing,
with BPF to create high-performance programmable switches.
Another presentation on P4, with some elements related to eBPF hardware
offload on Netronome’s NFP (Network Flow Processor) architecture.
CETH stands for Common Ethernet Driver Framework for faster network I/O,
a technology initiated by Mellanox.
InKeV is an eBPF-based datapath architecture for virtual networks,
targeting data center networks. It was initiated by PLUMgrid, and claims to
achieve better performances than OvS-based OpenStack solutions.
A “library to create, load and use eBPF programs from Go”
Documentation
About BPF
bpf(2)
man page
about the bpf()
system call, which is used to manage BPF programs and
maps from userspace. It also contains a description of BPF advanced features
(program types, maps and so on). The second one is mostly addressed to people
wanting to attach BPF programs to tc interface: it is the tc-bpf(8)
man
page, which is a
reference for using BPF with tc, and includes some example commands and
samples of code. The eBPF helper functions, those white-listed functions that
can be called from within an eBPF program, have been documented in the kernel
source file that can be automatically converted into a bpf-helpers(7)
manual page (see
the relevant Makefile).About tc
Edit: While still available from the Git history, these files have been
deleted from iproute2 in October 2017.tc
a lot, here are some good news: I
wrote a bash completion function
for this tool, and it is now shipped with package iproute2 coming with
kernel version 4.6 and higher!About XDP
About flow dissectors
About P4 and BPF
Tutorials
Examples
From the kernel
From package iproute2
tracing
directory include a lot of example tracing programs. The
tutorials mentioned earlier are based on these. These programs cover a wide
range of event monitoring functions, and some of them are
production-oriented. Note that on certain Linux distributions (at least for
Debian, Ubuntu, Fedora, Arch Linux), these programs have been
packaged and can be
“easily” installed by typing e.g. # apt install bcc-tools
, but as of this
writing (and except for Arch Linux), this first requires to set up IO Visor’s
own package repository.Other examples
Manual pages
tc
tool itself. So if you intend to use BPF with tc, you can find some example
invocations in the
tc-bpf(8)
manual page.
The code
BPF code in the kernel
syscall.c
, while
core.c
contains the interpreter. The other files have self-explanatory
names: verifier.c
contains the verifier (no kidding), arraymap.c
the
code used to interact with maps of type array, and so on.act_bpf.c
(action) and cls_bpf.c
(filter).XDP hooks code
dev_change_xdp_fd()
in file
linux/net/core/dev.c
is called and sets a XDP hook. Such hooks are located in the drivers of
supported NICs. For example, the nfp driver used for Netronome hardware has
hooks implemented in files under the
drivers/net/ethernet/netronome/nfp/
directory. File nfp_net_common.c receives Netlink commands and calls
nfp_net_xdp_setup()
, which in turns calls for instance
nfp_net_xdp_setup_drv()
to install the program.BPF logic in bcc
BPF
class, is initiated in file
bcc/src/python/bcc/__init__.py.
But most of the interesting stuff—to my opinion—such as loading the BPF program
into the kernel, happens
in the libbcc C library.Code to manage BPF with tc
exec
command, whatever this may
be). File q_clsact.c defines the clsact
qdisc especially created for BPF.
But most of the BPF userspace logic is implemented in
iproute2/lib/bpf.c
library, so this is probably where you should head to if you want to mess up
with BPF and tc (it was moved from file iproute2/tc/tc_bpf.c, where you may
find the same code in older versions of the package).BPF utilities
bpf_asm.c
, bpf_dbg.c
,
bpf_jit_disasm.c
) related to BPF, under the
linux/tools/net/ (until Linux 4.14)
or
linux/tools/bpf/
directory depending on your version:
bpf_asm
is a minimal cBPF assembler.bpf_dbg
is a small debugger for cBPF programs.bpf_jit_disasm
is generic for both BPF flavors and could be highly useful
for JIT debugging.bpftool
is a generic utility written by Jakub Kicinski, and that can be
used to interact with eBPF programs and maps from userspace, for example to
show, dump, load, pin programs, or to show, create, pin, update, delete maps.
It can also attach and detach programs to cgroups, and has JSON support. It
keeps getting more and more features, and is expected to be the go-to tool
for eBPF introspection and simple management.bpf.h
and
libbpf.h
(higher level) from directory
linux/tools/lib/bpf/.
The tool bpftool
heavily relies on those libraries, for example.Other interesting chunks
LLVM backend
Running in userspace
Commit logs
git blame <file>
to see what commit introduced a particular line of
code, then git show <commit>
to have details (or search by keyword in git
log
results, but this may be tedious). See also the list of eBPF features per
kernel version on bcc repository, that links to relevant
commits.
Troubleshooting
Errors at compilation time
bcc
shell function provided by tc-bpf
man page (to compile
C code into BPF): I once had to add includes to the header for the clang
call:__bcc() { clang -O2 -I "/usr/src/linux-headers-$(uname -r)/include/" \ -I "/usr/src/linux-headers-$(uname -r)/arch/x86/include/" \ -emit-llvm -c $1 -o - | \ llc -march=bpf -filetype=obj -o "`basename $1 .c`.o"
}
bcc
, do not forget to have a look at
the FAQ
of the tool set.Errors at load and run time
tc
tool has a verbose mode, and that it works well with BPF: try
appending verbose
at the end of your command line.BPF
class has a debug
argument that can
take any combination of the three flags DEBUG_LLVM_IR
, DEBUG_BPF
and
DEBUG_PREPROCESSOR
(see details in
the source
file).
It even embeds some facilities to print output
messages
for debugging the code.-g
flag for compiling enables you to later dump your program in the rather
human-friendly format used by the kernel verifier. To proceed to the dump,
use:$ llvm-objdump -S -no-show-raw-insn bpf_program.o
bpf
tag on StackOverflow,
but as of this writing it has been hardly used—ever (and there is nearly
nothing related to the new eBPF version). If you are a reader from the Future
though, you may want to check whether there has been more activity on this
side.
And still more!