eBPF Essentials: A Glimpse Into Kernel Magic

Allan John
10 min readMar 16, 2023

--

Courtesy to unsplash.com

Introduction

In this blog, I’ll be introducing eBPF, a versatile and efficient technology that has made significant strides within the Linux kernel ecosystem. As we discuss eBPF’s core concepts and a bit of its internals, please note that we’ll touch upon some Linux kernel specifics. My aim is to provide a learning experience that enables you to grasp eBPF’s fundamentals and appreciate its powerful capabilities. Let’s get started!

This will be a bit long to read as the content was not so much big for 2 blogs. I hope its worth it!

What is BPF?

Berkeley Packet Filter (BPF) is a technology that originated in 1992, designed to efficiently filter and capture packets within UNIX-based operating systems. It functions as a virtual machine, complete with a distinct instruction set, storage objects, and supportive helper functions. The Linux kernel features a BPF runtime responsible for executing these instructions, using both an interpreter and a Just-In-Time (JIT) compiler that transforms BPF instructions into native executable code. To ensure kernel integrity and prevent crashes, a dedicated verifier rigorously checks BPF instructions for safety before they are executed, regardless of whether they are interpreted or JIT compiled.

BPF offers an efficient and secure method for interacting with the Linux Kernel, enabling system customization and control through programmable kernel enhancements.

In layman’s terms, BPF is much like a mini-computer nestled within the larger computer system, seamlessly executing small programs without impacting the main computer’s operations. This concept can be compared to JavaScript, which runs miniature programs on a webpage, such as tracking mouse clicks or scrolling positions. Despite running in the background, JavaScript ensures that the webpage and its users remain unaffected.

eBPF: Addressing BPF’s Limitations

While BPF proved to be efficient and useful for packet filtering and capturing, it had certain limitations that restricted its scope and adaptability in modern systems:

  1. Limited instruction set: BPF’s original instruction set was relatively simple and tailored for packet filtering, making it insufficient for more complex tasks and use cases.
  2. Lack of extensibility: The original BPF framework wasn’t designed for easy extensibility, making it challenging to add new features or adapt it for different applications beyond packet filtering.
  3. Inflexible data structures: BPF’s initial data structures were quite rigid, limiting the flexibility needed to handle complex data manipulation and analysis in modern systems.

These limitations prompted the development of eBPF (Extended Berkeley Packet Filter), which addresses these issues and greatly expands BPF’s capabilities:

  1. Expanded instruction set: eBPF features an enhanced instruction set that allows for more complex operations and supports a broader range of use cases.
  2. Improved extensibility: The eBPF framework is designed to be more easily extended, enabling the addition of new features and applications beyond packet filtering, such as performance monitoring, security, and observability.
  3. Flexible data structures: eBPF introduces new, versatile data structures like maps and arrays, providing the flexibility needed for sophisticated data manipulation and analysis.

In short, eBPF was developed to overcome the limitations of the original BPF, offering a more powerful, flexible, and extensible solution that caters to the demands of modern systems.

Ok, Why not Kernel Modules?

Kernel modules have traditionally been used to extend the functionality of the Linux kernel and its a valid question why don’t we use them instead of eBPF?

Well, thats where eBPF stands out.

  • Kernel modules require reload of the kernel when adding or removing modules. An eBPF program can be added, updated, modified or removed without reloading kernel.
  • eBPF programs are checked by a verifier before loading into kernel. Kernel modules on the other hand do not have this feature, which can cause kernel panic errors due to bugs or worse can introduce security vulnerabilities.
  • eBPF provides rich and advanced data structures which is better than the Kernel module data structures.
  • eBPF programming is easier than kernel engineering because of safety, flexibility, learning curve, tooling and libraries

Short history

eBPF was conceived by Alexei Starovoitov, who was exploring innovative approaches to develop software-defined networking (SDN) solutions. At its proposal stage, Daniel Borkmann, a kernel engineer at Red Hat, collaborated with Alexei to refine eBPF for integration into the kernel as a versatile virtual machine, ultimately replacing the existing BPF. Since its successful inclusion, eBPF has garnered contributions from numerous developers, further expanding its capabilities and applications.

eBPF Applications in real-world

eBPF has found numerous real-world applications, enabling developers and operators to gain deeper insights into system performance, improve security, and customize system behavior. Some of these applications include:

  1. Networking: eBPF is widely used for implementing software-defined networking (SDN) solutions, load balancing, and network function virtualization (NFV). Companies like Cilium leverage eBPF to create high-performance networking and security solutions for containerized environments, such as Kubernetes.
  2. Performance monitoring: eBPF enables the development of advanced performance monitoring and observability tools like BCC (BPF Compiler Collection) and BPFtrace. These tools provide fine-grained visibility into system and application behavior, helping developers identify bottlenecks, optimize performance, and troubleshoot issues.
  3. Security: eBPF can be employed for real-time security monitoring and enforcement. It allows for the implementation of custom security policies, intrusion detection systems, and sandboxing solutions that enhance system protection without compromising performance.
  4. Tracing and profiling: eBPF can be used to create powerful tracing and profiling tools for various layers of the system, from the kernel to user-space applications. This enables developers to understand how their code interacts with the kernel, identify performance issues, and optimize resource usage.

These real-world scenarios demonstrate the versatility and potential of eBPF in improving various aspects of modern computing systems, from networking and performance to security and observability.

BPF Runtime

BPF Runtime Internals

This is an illustrations on how eBPF programs are executed. This is an architecture of BPF Runtime.

  1. BPF custom program: The custom code written by developers using eBPF’s instruction set to perform specific tasks within the kernel.
  2. BPF Verifier: A kernel component that checks the eBPF program for safety and correctness before allowing it to execute.
  3. BPF JIT compiler: A Just-In-Time compiler that translates the eBPF bytecode into native machine code for improved performance, if bpf_jit_enabled is true.
  4. Interpreter: An alternative execution mode for eBPF programs, which interprets the bytecode directly without JIT compilation. This is when bpf_jit_enabled is false.
  5. BPF Context: A data structure passed to an eBPF program when it is executed. The context provides the eBPF program with the necessary information about the event or data it is processing. The contents of the context depend on the type of eBPF program being executed and the specific use case it is designed for. For example, if an eBPF program is attached to a network socket for packet processing, the context would contain information about the packet, such as the source and destination addresses, the protocol used, and the payload.
  6. Maps: Key-value data structures that allow eBPF programs to store and share data between the kernel and user space.
  7. Helper functions: Kernel-provided functions that eBPF programs can call to perform specific operations, such as accessing network packets or interacting with system resources.

How eBPF is efficient?

Lets take an example of how an observability tool is used to generate a cli histogram.

Generate histogram before and using BPF

Before BPF, to produce a histogram, the steps would be:

  1. In the kernel:
    - Enable instrumentation for CPU events.
    - For each event, write a record to the perf buffer. If tracepoints are used (preferred), the record includes various metadata fields about the CPU.
  2. In user-space:
    - Copy the event buffer from the kernel to user space periodically.
    - Iterate over each event, extracting the ‘bytes’ field from the metadata. Other fields are ignored.
    - Generate a histogram summary based on the ‘bytes’ field.

Writing events to per buffer and copying it to user space introduce high-performance overhead for high performance systems, such as transferring ten thousand CPU trace records to user space for parsing and summarization every second.

With BPF, the steps for the byte size program are simplified:

  1. In the kernel:
    - Enable instrumentation for CPU events and attach a custom BPF program, defined by byte size
  2. In the kernel:
    - For each event, execute the BPF program. It retrieves only the ‘bytes’ field and stores it in a custom BPF map histogram.
    - Read the BPF map histogram once and display the output.

This avoids the cost of copying events to user space and reprocessing them. The only data copied to user space is shown in the below output. The “count” column, which is a sum of number.

This approach eliminates the need to copy events to user space and reprocess them, as well as the transfer of unused metadata fields. The only data transferred to user space is the “count” column, which is the sum of number.

This is a sample output of one of the eBPF tool,

Sample output for an eBPF tool

Traditional tools vs eBPF tools.

Traditional tools and eBPF tools both serve to monitor, troubleshoot, and analyze system performance. However, there are significant differences between the two approaches:

1. Scope of information:

  • Traditional tools often rely on pre-existing kernel counters, logs, and interfaces (such as procfs, sysfs, or perf) to gather information. They may provide limited visibility into the system’s internals and may not be customizable to suit specific needs.
  • eBPF tools allow for more fine-grained and customizable data collection, enabling developers to create custom programs that access kernel data structures, monitor system events, and even modify kernel behavior.

2. Performance impact:

  • Traditional tools can sometimes impose a significant performance overhead when collecting data, particularly if they involve frequent context switches between kernel and user space or require substantial data processing.
  • eBPF tools generally have lower performance overhead, as they run directly in the kernel and can process data more efficiently. eBPF programs are also verified for safety and optimized by the JIT compiler for better performance.

3. Flexibility and extensibility:

  • Traditional tools are often limited by the capabilities provided by the kernel interfaces they rely on. Customizing or extending these tools can be challenging and may require kernel modifications.
  • eBPF tools are highly flexible and extensible, allowing developers to create custom eBPF programs tailored to their specific requirements. The eBPF ecosystem also offers various libraries and frameworks that simplify the development of new eBPF tools.

4. Applicability:

  • Traditional tools typically focus on specific tasks or areas of the system (e.g., networking, storage, or CPU performance) and may not be suitable for cross-domain analysis or correlation.
  • eBPF tools can be applied to a wide range of use cases, including networking, security, observability, and performance monitoring, making it easier to perform cross-domain analysis and gain deeper insights into system behavior.

While traditional tools can still be useful for specific tasks or in certain situations, eBPF tools offer greater visibility, flexibility, and performance, making them a preferred choice for many modern monitoring and analysis needs.

Real life use-case: Tracing

As system admins, we all know what an amazing tool is strace. Its a very good tool to check all system calls being called. I personally have used this tool a lot, even on production servers. One thing I came to know, after learning about BPF, was that this command creates a lot of overhead to generate the output and running this tool on a production server can be catastrophic! It might be okay, if there is redundancy.

Lets take a look at strace and trace, where trace is an eBPF tool.

This is an output of strace:

Output from strace

And this is an output from trace:

Output from trace

As you can see, how the data is shown is much more better to view than strace, although strace is a good tool.

eBPF in Observability:

Some of the common tools used in observability:

Additional eBPF tools that can be used to drill down analysis:

These tools can be installed for Linux OS with kernel version above 4.7 either by cloning the git-repo or by installing from package-managers. For more information on the tools, check these pages for bcc tools and bpftrace

Some of the popular tools thats worth mentioning:

  • Cilium for networking
  • BCC for tracing
  • XDP for high-performance packet processing

Building your own eBPF tool

You can build your own eBPF tools if you want. Currently you can leverage bcc(Berkley Compiler Collection) or bpftrace. These are the two popular frontend tools used to create custom tools. You can refer to these links for more tools and understanding them to make more complex or sophisticated tools out of it.

Limitations and Challenges

As with every new technology thats in the market, there are some limitations with eBPF. Some of them are:

  • Use of eBPF was started from Linux Kernel version 4.1 to 4.7. So to have the least amount of eBPF tools available should be 4.7. Although 4.7 would have only limited number of tools, the latest version will have and support a lot of tools that is available
  • Learning curve can be a bit difficult, as developing a new tool will require some knowledge about the frontend tools like bcc, bpftrace. Compared to Kernel engineering the learnign curve is small.

Future of eBPF

The future of eBPF is promising, as its adoption is expected to grow across industries and platforms, including cloud providers and container orchestration systems like Kubernetes. The ongoing development of the eBPF ecosystem, with improved tooling, libraries, and enhanced capabilities, will simplify eBPF programming and enable more sophisticated applications. Additionally, cross-platform support, such as Microsoft’s eBPF for Windows project, will broaden eBPF’s applicability. As eBPF gains momentum, we can anticipate deeper integration with existing monitoring, security, and management tools, allowing users to benefit from eBPF without abandoning or relearning their current toolsets. Overall, eBPF’s future is poised for growth, solidifying its position as a powerful and versatile technology in the Linux kernel and beyond.

Conclusion

In summary, eBPF is a powerful tool that has made a big difference in how we work with the Linux kernel. It has improved upon traditional BPF and opened up new possibilities in areas like networking, monitoring, security, and more. As eBPF continues to grow and more people start using it, its impact will only become greater. eBPF is an important technology to learn about because it helps us create better and more efficient solutions for Linux systems.

Hope you enjoyed reading. Thank you!

--

--