When using bridging for Xen Networking and your guests machines (domUs in Xen parlance) are fully managed by third parties, some sort of isolation is specially needed. A rogue admin can change the IP and/or MAC address(es) assigned to its domU and potentially cause an IP address conflict.
Xen provides an script called
vif-bridge that takes care of adding domU’s
virtual interfaces to dom0’s bridge, bring them up and add iptables rules
allowing datagrams whose source is one of the assigned IP address(es) coming in
through domU’s virtual interfaces.
Those iptables rules might be not enough. They don’t enforce usage of the assigned MAC addresses and could interfere with current deployed firewall. Another point, in my opinion, is that these addresses policies belong to Link Layer (bridge decision) instead of Network Layer (see PacketFlow), so I prefer to have them enforced with ebtables.
After deploying the adapted
vif-bridge, domU creation began to fail
randomly. Some debug code added at the beginning of the script threw some
+ ebtables -F veth2250_IN ebtables v2.0.9-2:communication.c:388:--BUG--: Couldn't update kernel counters ++ sigerr + ebtables -N veth639a_IN + ebtables -P veth639a_IN DROP Chain 'veth639a_IN' doesn't exist. ++ sigerr + ebtables -A veth639_OUT -p arp --arp-ip-dst 10.99.143.100 -j ACCEPT + ebtables -A veth639_OUT -p arp --arp-ip-dst 10.99.144.100 -j ACCEPT The kernel doesn't support a certain ebtables extension, consider recompiling your kernel or insmod the extension. ++ sigerr
As you can see those
ebtables errors are triggered by correct trivial calls.
To make it worse, chain, interface and rule names varied from one error to
other. Looking some help for “Couldn’t update kernel counters” or
“communication.c:388:–BUG–:" didn’t help at all.
While debugging, I learned that an instance of
vif-bridge is run by Xen for
each defined network interface and they all are run in parallel. All my domU
have two virtual network interfaces defined.
At that point I had no clue about the problem’s cause. I decided to
ebtables to discard those “make sure you’re running the last
version” support advises (squeeze’s version is 220.127.116.11, upstream is
2.0.10). With the new version I began to see this new error in logs:
+ ebtables -A FORWARD -o veth2450 -p ip4 -d 00:16:3d:1c:26:4a --ip-dst 10.49.216.50 -j ACCEPT Unable to update the kernel. Two possible causes: 1. Multiple ebtables programs were executing simultaneously. The ebtables userspace tool doesn't by default support multiple ebtables programs running concurrently. The ebtables option --concurrent or a tool like flock can be used to support concurrent scripts that update the ebtables kernel tables. 2. The kernel doesn't support a certain ebtables extension, consider recompiling your kernel or insmod the extension.
After reading this I did immediately understand what was happening. That error
description couldn’t be more clear and I thank upstream author for it. I
never considered any concurrency problem in
ebtables, not even after seeing
random illogical errors generated by trivial rules.
--concurrent is available in 2.0.10 so I took the
flock way, the fixed
script is here.
Later I found the problem description in ebtables’ basic examples page:
Updating the ebtables kernel tables is a two-phase process. First, the userspace program sends the new table to the kernel and receives the packet counters for the rules in the old table. In a second phase, the userspace program uses these counter values to determine the initial counter values of the new table, which is already active in the kernel. These values are sent to the kernel which adds these values to the kernel’s counter values. Due to this two-phase process, it is possible to confuse the ebtables userspace tool when more than one instance is run concurrently. Note that even in a one-phase process it would be possible to confuse the tool.
It might be very difficult to reproduce the errors shown above if you don’t
have more than one network interface in your domUs and your
have more than a few ebtables rules.
Summarizing. If you:
- are calling
ebtablesfrom your Xen scripts.
- have an
ebtablesprior to 2.0.10 (as the one in Debian squeeze or Ubuntu precise).
- are facing seemingly random
- are not being helped by logs or
high chances are that your scripts are running
ebtables concurrently. Just