Integrating Cilium with Gateway API, IPv6, and BGP for Advanced Networking Solutions

12 min readJun 27, 2024

Introduction

This document is simiar to a previous document, I have already published with the sane configuration but with ipv4. So if you already read the article, then you will notice a lot of text is being copied from there ;)

The setup architecture is similar to the previous document, a k3d cluster with 2 agents.

Setup

The setup is going to be

a K3d cluster without CNI and kube-proxy disabled.
Cilium is installed as CNI which also takes care of the routing which was handled by kube-proxy. BGP control plane is enabled, and a set of IP’s are configured for Loadbalancers and advertised with BGP.
Deploy Gateway API instead of Ingress controller and test a single Gateway.
FRR is deployed on the host machine to connect to these Loadbalancer IPs. Our setup looks like this:

Like the setup in the previous document, we follow the same setup. We configure Cilium to act as a BGP peer and then advertise any new Loadbalancer IPs to the docker bridge IP(172.50.0.1/32) and FRR will be configured to listen to the K3D node IPs with the docker bridge IP(172.50.0.1/32) as the router. So when the connection is established between FRR and cilium BGP, FRR will publish these new routes to the system route table, and the LoadBalancer IPs will be reachable via host system. Then we use HAProxy to make the Loadbalancer IP reachable from internet via the public IP of the server, because the IPs are not publicly routable

Due to limitations on pure IPv6 setup, on cilium, or because of the underlying host configuration(I tested with IPv6), Its not possible to setup IPv6 routing between docker and FRR. This is why we setup BGP communication to be over IPv4.

There was a lot of research in reddit forums and other tech forums, documentations and many trial and error configuration to end up with this configuration that works. You can use this configuration as a baseline and work on different configurations to make it work for use case.

Prerequistes

Enable IPv6 on Docker

Modify the /etc/docker/daemon.json file and add the following information:

Since IPv6 IPs are supposed to be internet routable, there are some limitations with IPv6 as private IPs and docker also is not configured to work properly with IPv6 right out of the box. So for testing purposes, I have configured a random range. If you are testing, you can can use the same range or different range of your choice

Enable IPv6 kernel modules

Since cilium uses IPtables to write routes to kernel, we need to enable additional kernel modules

sudo modprobe -v ip6table_filter
sudo modprobe -v ip6_tables
sudo modprobe -v ip6table_mangle
sudo modprobe -v ip6table_raw
sudo modprobe -v iptable_nat
sudo modprobe -v ip6table_nat
sudo modprobe -v iptable_filter
sudo modprobe -v xt_socket

Add this configuration in kernel modules configuration file to persist the modules after restart

echo "ip6table_filter
ip6_tables
ip6table_mangle
ip6table_raw
iptable_nat
ip6table_nat
iptable_filter
xt_socket" | sudo tee /etc/modules-load.d/modules.conf

Check if the modules are loaded:

lsmod | grep xt_socket

xt_socket              16384  0
nf_socket_ipv4         16384  1 xt_socket
nf_socket_ipv6         20480  1 xt_socket
nf_defrag_ipv6         24576  2 nf_conntrack,xt_socket
nf_defrag_ipv4         16384  2 nf_conntrack,xt_socket
x_tables               57344  9 ip6table_filter,ip6table_raw,iptable_filter,ip6table_nat,xt_socket,ip6_tables,ip_tables,iptable_nat,ip6table_mangle

Ensure sysctl parameters are enabled

echo "net.core.devconf_inherit_init_net=1
net.netfilter.nf_conntrack_max=196608
net.ipv4.conf.all.forwarding = 1
net.ipv6.conf.all.forwarding = 1" | sudo tee /etc/sysctl.d/01-sysctl.conf > /dev/null

sudo sysctl -p

Tools required before starting

helm
k3d
docker
cilium-cli (version >1.14.1)
frr (Can be used with Kube router or bird. Haven’t tried them yet. so..)
haproxy

Lets deploy:

All configs and scripts can be found in this github repo

Create a docker network to start with. Make sure you are enabling IPv6. We are also using IPv4 addresses for the host machines

docker network create \
      --driver bridge \
      --subnet "172.50.0.0/16" \
      --gateway "172.50.0.1" \
      --ip-range "172.50.0.0/16" \
      --ipv6 \
      --subnet "2001:3200:3200::/64" \
      "cilium"

Since K3D runs with docker, and docker has issues with IPv6, in our setup, we will run docker with IPv4 Gateway, which means the docker containers will have both IPv4 and IPv6 stack, however we will use the IPv6 the maximum and only use IPv4, when there is a limitation.

To create a k3d cluster with cilium, we first need to run some commands to mount bpf anf cgroups. Its better if we run the commands by an k3d-entrypoint-cilium.sh for k3d, rather running the commands after the container for k3d is up. So we create a script and mount it as entrypoint.

k3d-entrypoint-cilium.sh

Create a k3d config file

Note: K3D config does not support relative paths. Fix will be to replace the volume mount path to an absolute path.

Deploy k3d cluster

k3d cluster create -c k3d-ipv6-config.yaml

When the cluster is deployed, there will be only couple of pods in pending state. This is normal as we need to deploy a CNI.

docker ps
CONTAINER ID   IMAGE                      COMMAND                  CREATED         STATUS         PORTS                     NAMES
cb7da708c3bb   rancher/k3s:v1.30.1-k3s1   "/bin/k3d-entrypoint…"   5 minutes ago   Up 4 minutes                             k3d-cilium-cluster-agent-1
86699ebe1fbb   rancher/k3s:v1.30.1-k3s1   "/bin/k3d-entrypoint…"   5 minutes ago   Up 4 minutes                             k3d-cilium-cluster-agent-0
b5856ce06ef3   rancher/k3s:v1.30.1-k3s1   "/bin/k3d-entrypoint…"   5 minutes ago   Up 5 minutes   0.0.0.0:42663->6443/tcp   k3d-cilium-cluster-server-0

kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   coredns-576bfc4dc7-j6v46                  0/1     Pending   0          5m17s
kube-system   local-path-provisioner-75bb9ff978-xqcdj   0/1     Pending   0          5m17s
kube-system   metrics-server-557ff575fb-lllqt           0/1     Pending   0          5m17s

And for Cilium, we need a couple of CRDs to be installed first.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.1.0/config/crd/experimental/gateway.networking.k8s.io_gatewayclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.1.0/config/crd/experimental/gateway.networking.k8s.io_gateways.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.1.0/config/crd/experimental/gateway.networking.k8s.io_httproutes.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.1.0/config/crd/experimental/gateway.networking.k8s.io_tlsroutes.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.1.0/config/crd/experimental/gateway.networking.k8s.io_referencegrants.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.1.0/config/crd/experimental/gateway.networking.k8s.io_grpcroutes.yaml

Cilium will be installed as a Helm chart. The values for cilium:

Notable values:

tunnel: disabled : Tunneling does not work properly with IPv6, so its best to disable it for now, until cilium issues a proper fix

bgpControlPlane.enabled: true : We are deploying latest rc version, to test the BGPClusterConfig which replaces BGPPeerConfig

enableIPv6BIGTCP: true : This is something I want to test the performance. This is something to improve the performance of number of transactions per second. This would be better over JumboFrames, where all network devices in the network should support JumboFrames.

In the latest version, Envoy is deployed along with the cilium pods. Since the scope of the story is to deploy a pure IPv6 cluster, I am not exploring if i want to use it or get rid off it.

Installing helm chart now:

helm upgrade --install cilium cilium/cilium --version 1.16.0-rc.0 \
       --namespace=kube-system -f cilium-values.yaml

Wait for a while, so that the cilium operator will get installed, and then cilium pods will be deployed to each node and then all pods should be in running state.

kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   local-path-provisioner-75bb9ff978-xqcdj   1/1     Running   0          35m
kube-system   coredns-576bfc4dc7-j6v46                  1/1     Running   0          35m
kube-system   metrics-server-557ff575fb-lllqt           1/1     Running   0          35m
kube-system   cilium-operator-5bdff49868-nhl2m          1/1     Running   0          8m39s
kube-system   cilium-bfjp2                              1/1     Running   0          8m39s
kube-system   cilium-frv2r                              1/1     Running   0          8m39s
kube-system   cilium-znx8l                              1/1     Running   0          8m39s
kube-system   cilium-envoy-wdmgs                        1/1     Running   0          8m39s
kube-system   cilium-envoy-jk5tq                        1/1     Running   0          8m39s
kube-system   cilium-envoy-xtqlt                        1/1     Running   0          8m39s
kube-system   hubble-ui-59bb4cb67b-bz72q                2/2     Running   0          8m39s
kube-system   hubble-relay-75fb6597d7-7gzln             1/1     Running   0          8m39s

Cilium status can also be checked with the cilium-cli tool

cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 3, Ready: 3/3, Available: 3/3
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet              cilium-envoy       Desired: 3, Ready: 3/3, Available: 3/3
Deployment             cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
Containers:            cilium-envoy       Running: 3
                       hubble-relay       Running: 1
                       cilium-operator    Running: 1
                       cilium             Running: 3
                       hubble-ui          Running: 1
Cluster Pods:          5/5 managed by Cilium
Helm chart version:
Image versions         cilium             quay.io/cilium/cilium:v1.16.0-rc.0@sha256:bc88ac635a871293d5d2837196e53adba1ea55f79cd3f5cba802dd488312fd2a: 3
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.13.1@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b: 1
                       hubble-ui          quay.io/cilium/hubble-ui:v0.13.1@sha256:e2e9313eb7caf64b0061d9da0efbdad59c6c461f6ca1752768942bfeda0796c6: 1
                       cilium-envoy       quay.io/cilium/cilium-envoy:v1.29.5-8fccf45a8ab9da13824e0f14122d5db35673f3bb@sha256:f2c0b275aebe14c7369c8396c4461c787b12b823aba0c613ebbed7a3f92f288e: 3
                       hubble-relay       quay.io/cilium/hubble-relay:v1.16.0-rc.0@sha256:22b7f87db6a7a00d10e4ad8c316324368693b0e7f158055b7f81f39fb27928e2: 1
                       cilium-operator    quay.io/cilium/operator-generic:v1.16.0-rc.0@sha256:78b9951cd6d92e7c954b9d7d2791cf52c83895441147deec3906c03363fd1169: 1

Now we need to define a set of IPs for Loadbalancer ranges, when services are created as type Loadbalancer or for Gateway API

kubectl apply -f ippool.yaml

With IPv4, cilium automatically figures out the IP of the node to set as Router-id for bgp configuration. With IPv6, its not really automated, so we need to annotate nodes with setting the IPs manually.

kubectl annotate node k3d-cilium-cluster-agent-1 --overwrite cilium.io/bgp-virtual-router.64512="router-id=172.50.0.1"
kubectl annotate node k3d-cilium-cluster-agent-0 --overwrite cilium.io/bgp-virtual-router.64512="router-id=172.50.0.1"
kubectl annotate node k3d-cilium-cluster-server-0 --overwrite cilium.io/bgp-virtual-router.64512="router-id=172.50.0.1"

Now to use the BGP Control Plane v2, Instead of a single BGP Peering Policy, we need to define a couple of Kubernetes Objects. You can find more information here on how it works

So we need a couple of configurations

This is the BGP cluster config, here we define the label for nodes to apply the rule. and bgpInstances. The peerAddress is the IPv6 gateway of the docker network and this clusterConfig refers to a peerConfig, which is described below.

Now with v2 configuration we can apply separate rules or different configuration for BGP within a single cluster which was not possible with v1 configuration.

The Peer config references BGP Advertisement config which is selected with a labelSelector

This config will advertise routes for all services that have Loadbalancer type. I have disabled the selector, because I want to deploy a Gateway API and Ineed the routes be advertised. I could also add a label to the Gateway service, but I am playing around :D

Lets deploy the config

kubectl apply -f bgp-config.yaml

To test, we will create a new service pointing to the hubble service and then create a Gateway and add an HTTP route to it

Note that I have added a selector for Gateway to be applied to namespaces with shared-gw: true label. So now we need to annotate it and apply the config

kubectl label namespace kube-system shared-gw=true
kubectl apply -f hubble.yaml

lets now check Gateway, HTTPRoute and services

kubectl -n kube-system get gateway,svc,httproute
NAME                                               CLASS    ADDRESS   PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/shared-gw-ipv6   cilium   2004::1   True         13m

NAME                                    TYPE           CLUSTER-IP           EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns                        ClusterIP      1001:cafe:43:9::a    <none>        53/UDP,53/TCP,9153/TCP   96m
service/metrics-server                  ClusterIP      1001:cafe:43:9::be   <none>        443/TCP                  96m
service/cilium-envoy                    ClusterIP      None                 <none>        9964/TCP                 69m
service/hubble-peer                     ClusterIP      1001:cafe:43:9::87   <none>        443/TCP                  69m
service/hubble-relay                    ClusterIP      1001:cafe:43:9::cf   <none>        80/TCP                   69m
service/hubble-ui                       ClusterIP      1001:cafe:43:9::80   <none>        80/TCP                   69m
service/hubble-ipv6                     ClusterIP      1001:cafe:43:9::a1   <none>        80/TCP                   13m
service/cilium-gateway-shared-gw-ipv6   LoadBalancer   1001:cafe:43:9::cc   2004::1       80:30604/TCP             13m

NAME                                                 HOSTNAMES                AGE
httproute.gateway.networking.k8s.io/hubble-ipv6-gw   ["hubble.example.com"]   13m

As you can see, the gateway has one Loadbalancer IP which was from the assigned Loadbalancer IPPools which we added earlier and the same IP can be seen coming from the service. Also the httproute has got a DNS name assigned to it

Lets install FRR for advertising and receiving routes.

sudo apt install frr -y

Enable BGP for frr, by editing /etc/frr/daemons and change the value of bgpd from no to yes

Edit the file /etc/frr/frr.conf and add the following config

We are setting the localASN number as 64513, because we used this number as the peerASN in cilium
We set the peer router-id as the gateway IP of the docker network created for the cluster
We define each node in the cluster as neighbour in the configuration to receive announcements

Now start frr

sudo systemctl start frr

After a couple of seconds you can check the status in both cilium and frr for peering status

cilium bgp peers
Node                          Local AS   Peer AS   Peer Address        Session State   Uptime   Family         Received   Advertised
k3d-cilium-cluster-agent-0    64512      64513     2001:3200:3200::1   established     15m38s   ipv6/unicast   1          2
k3d-cilium-cluster-agent-1    64512      64513     2001:3200:3200::1   established     15m28s   ipv6/unicast   1          2
k3d-cilium-cluster-server-0   64512      64513     2001:3200:3200::1   established     15m38s   ipv6/unicast   1          2

sudo vtysh -c "show bgp summary"

IPv4 Unicast Summary:
BGP router identifier 172.50.0.1, local AS number 64513 VRF default vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 3, using 60 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor          V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
2001:3200:3200::2 4      64512        35        35        0    0    0 00:15:56        NoNeg    NoNeg N/A
2001:3200:3200::3 4      64512        35        35        0    0    0 00:15:56        NoNeg    NoNeg N/A
2001:3200:3200::4 4      64512        35        35        0    0    0 00:15:46        NoNeg    NoNeg N/A

Total number of neighbors 3

IPv6 Unicast Summary:
BGP router identifier 172.50.0.1, local AS number 64513 VRF default vrf-id 0
BGP table version 1
RIB entries 1, using 96 bytes of memory
Peers 3, using 60 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor          V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
2001:3200:3200::2 4      64512        35        35        1    0    0 00:15:56            1        1 N/A
2001:3200:3200::3 4      64512        35        35        1    0    0 00:15:56            1        1 N/A
2001:3200:3200::4 4      64512        35        35        1    0    0 00:15:46            1        1 N/A

Total number of neighbors 3

We can see that the routes are being advertised and received. If there is something wrong with the connectivity, we can see the advertised and recevied routes to be 0 from cilium-cli and Up/Down status will show a “Never” state.

Lets check the routes, to see if FRR is creating proper routes:

ip -6 r s
::1 dev lo proto kernel metric 256 pref medium
2001:db8::/64 dev docker0 metric 1024 linkdown pref medium
2001:3200::/64 dev docker0 proto kernel metric 256 linkdown pref medium
2001:3200::/64 dev docker0 metric 1024 linkdown pref medium
2001:3200:3200::/64 dev br-455d8ca1d613 proto kernel metric 256 pref medium
2001:3200:3200::/64 dev docker0 metric 1024 linkdown pref medium
2001:3200:3200::/56 dev docker0 metric 1024 linkdown pref medium
2004::1 nhid 32 proto bgp metric 20 pref medium
 nexthop via 2001:3200:3200::4 dev br-455d8ca1d613 weight 1
 nexthop via 2001:3200:3200::2 dev br-455d8ca1d613 weight 1
 nexthop via 2001:3200:3200::3 dev br-455d8ca1d613 weight 1

Now lets do a curl to see if we can view the page

curl --connect-to 'hubble.example.com:80:[2004::1]:80' http://hubble.example.com
<!doctype html><html><head><meta charset="utf-8"/><title>Hubble UI</title><base href="/"/><meta name="color-scheme" content="only light"/><meta http-equiv="X-UA-Compatible" content="IE=edge"/><meta name="viewport" content="width=device-width,user-scalable=0,initial-scale=1,minimum-scale=1,maximum-scale=1"/><link rel="icon" type="image/png" sizes="32x32" href="favicon-32x32.png"/><link rel="icon" type="image/png" sizes="16x16" href="favicon-16x16.png"/><link rel="shortcut icon" href="favicon.ico"/><script defer="defer" src="bundle.main.eae50800ddcd18c25e9e.js"></script><link href="bundle.main.1d051ccbd0f5cd57832e.css" rel="stylesheet"></head><body><div id="app" class="test"></div></body></html>

HAProxy Setup

Lets try to make it DNS resolvable. One way to do it is to update the domain AAAA record with the IP address of the server and use HAProxy to route the request to backend Loadbalancer IP that is advertised. If done properly, with support from ISP, then we can actually use publicly routable IP address range and use it, and run something else, other than docker, like testing on a proper cluster.

When running on cloud, we can make use of external DNS to update the records. In local setups, HAProxy is king :D

So lets test with HAProxy

sudo apt install haproxy -y

Now replace /etc/haproxy/haproxy.cfg file with the following config

Here we will then choose to update the DNS to point to the public IPv6 address of the server or machine that we are running. However, to make it more simple, I will add to my hosts file, so I can check on my browser to confirm it works.

Now start HAProxy

sudo systemctl start haproxy

On my local machine, I will add to my hosts file

echo "xxxx:xxxx:xxxx:xxxx:0000:0000:0000:0001 hubble.example.com" | sudo tee -a /etc/hosts>/dev/null

Go to browser and go to the domain “hubble.example.com”

Please note that the domain name I used is a test one, please replace it with your own domain name. The domain name from registrar and on the http-route should match

I also tested with and without BigTCP enabled and the performance was comparatively better on random test. I cannot test it properly unless I have a proper setup. To learn more about BigTCP, you can check it here. Its an amazing article to read about it. For a concise information, BigTCP enables CNI to perform faster by sending data with bigger sizes than default 64kb without any change in network devices, where JumboFrames, also another way for faster performance requires physical network devices that supports JumboFrames

Conclusion

Cilium as an excellent CNI tool is still proving to be well efficient and with amazing performance and features. Its worth to explore working with pure IPv6 stack and have fun in learning about the limitations that it brings and how to overcome it.

I hope you enjoyed this blog,