Bridged vlan-interface doesn't receive traffic

Description

I have a Freenas 11.2-RELEASE-U1 installation with igb0 as network interface.
On freenas I have a Ubuntu VM that's bridged to igb0 and networking works as intended, the VM receives an IP-address via DHCP.

Now i want to change the bridge from bridging igb0 to bridging vlan253 on the same interface.
I create interface vlan253 with igb0 as parent interface and change the NIC on the VM to bridge this interface. Now traffic received doesn't make it back onto the bridge.

igb0 with bridge vlan253:

igb0: flags=8943 metric 0 mtu 1500
options=2400b9
ether 00:25:90:32:ca:4c
hwaddr 00:25:90:32:ca:4c
inet 172.17.8.50 netmask 0xffffff00 broadcast 172.17.8.255
nd6 options=9
media: Ethernet autoselect (1000baseT )
status: active

vlan253: flags=8943 metric 0 mtu 1500
options=200001
ether 00:25:90:32:ca:4c
nd6 options=9
media: Ethernet autoselect (1000baseT )
status: active
vlan: 253 vlanpcp: 0 parent interface: igb0
groups: vlan

Bridge with tap-interface for the VM:

bridge1: flags=8843 metric 0 mtu 1500
ether 02:37:99:93:61:01
nd6 options=1
groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap3 flags=143
ifmaxaddr 0 port 10 priority 128 path cost 2000000
member: vlan253 flags=143
ifmaxaddr 0 port 9 priority 128 path cost 20000

tap3: flags=8943 metric 0 mtu 1500
options=80000
ether 00:bd:15:9a:19:03
hwaddr 00:bd:15:9a:19:03
nd6 options=1
media: Ethernet autoselect
status: active
groups: tap
Opened by PID 51805

To my understanding, the traffic goes from the VM as follows:
VM -> TAP3 -> bridge1 -> vlan253 -> igb0 with vlan-tag -> DHCP-server
And should go the reverse path back to the VM.

If I tcpdump tap3 I can see the DHCP-request leaving:

14:53:49.302737 00:a0:98:56:19:5b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 331: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:a0:98:56:19:5b, length 289

tcpdump on bridge1 sees it pass through:

14:53:49.302759 00:a0:98:56:19:5b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 331: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:a0:98:56:19:5b, length 289

tcpdump on vlan253 sees it pass through:

14:53:49.302751 00:a0:98:56:19:5b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 331: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:a0:98:56:19:5b, length 289

tcpdump on igb0 sees it pass through with correct vlan-tag, and here the reply arrives with correct vlan-tag.

14:53:49.302755 00:a0:98:56:19:5b > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 335: vlan 253, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:a0:98:56:19:5b, length 289
14:53:49.327676 fc:ec:da:04:b3:b2 > 00:a0:98:56:19:5b, ethertype 802.1Q (0x8100), length 346: vlan 253, p 0, ethertype IPv4, 172.17.253.1.67 > 172.17.253.2.68: BOOTP/DHCP, Reply, length 300

So it seems like the DHCP-reply does make it back to the server, but not back to the vlan253-interface.
The odd thing is that there's other traffic on the vlan that passes through. A multicast packet comes at igb0:

14:53:43.889485 fc:ec:da:04:b3:b2 > 01:00:5e:00:00:fb, ethertype 802.1Q (0x8100), length 255: vlan 253, p 0, ethertype IPv4, 172.17.253.1.5353 > 224.0.0.251.5353: 0*- [0q] 5/0/0 PTR xxxxxxxxxxxxxxxx-0._spotify-connect._tcp.local., A 172.xx.xx.xx, TXT "CPath=/zc/0" "VERSION=1.0" "Stack=SP", SRV xxxxxxxxxxxxxxxx.local.:46387 0 0, PTR _spotify-connect._tcp.local. (209)

Passes through vlan253:

14:53:43.889498 fc:ec:da:04:b3:b2 > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 251: 172.17.253.1.5353 > 224.0.0.251.5353: 0*- [0q] 5/0/0 PTR xxxxxxxxxxxxxxxx-0._spotify-connect._tcp.local., A 172.xx.xx.xx, TXT "CPath=/zc/0" "VERSION=1.0" "Stack=SP", SRV xxxxxxxxxxxxxxxx-0.local.:46387 0 0, PTR _spotify-connect._tcp.local. (209)

Goes onto the bridge:

14:53:43.889509 fc:ec:da:04:b3:b2 > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 251: 172.17.253.1.5353 > 224.0.0.251.5353: 0*- [0q] 5/0/0 PTR xxxxxxxxxxxxxxxx-0._spotify-connect._tcp.local., A 172.xx.xx.xx, TXT "CPath=/zc/0" "VERSION=1.0" "Stack=SP", SRV xxxxxxxxxxxxxxxx-0.local.:46387 0 0, PTR _spotify-connect._tcp.local. (209)

And reaches tap3:

14:53:43.889556 fc:ec:da:04:b3:b2 > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 251: 172.17.253.1.5353 > 224.0.0.251.5353: 0*- [0q] 5/0/0 PTR xxxxxxxxxxxxxxxx-0._spotify-connect._tcp.local., A 172.xx.xx.xx, TXT "CPath=/zc/0" "VERSION=1.0" "Stack=SP", SRV xxxxxxxxxxxxxxxx-0.local.:46387 0 0, PTR _spotify-connect._tcp.local. (209)

I can also see the traffic just fine in the VM.

If I set a static IP on the VM and ping the gateway (172.17.253.1) I can see the ARP-request stop at the same place as the DHCP-reply.
If I set a set a static ARP-entry for the gateway in the VM, and try to ping the gateway from the VM, i can see the echo request leave but the reply stops in the same place and doesn't reach vlan253.

I found this thread on the FreeBSD forums:
https://forums.freebsd.org/threads/no-ingress-vlan-traffic.68781/
Which hints that it might be buggy NIC-drivers.
The onboard NICs are listed as follows by lspci:

01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

I'll check during this weekend if using another NIC changes anything.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Lewis Barclay 
August 30, 2019 at 2:08 PM

Hi, any updates on this? Still seems to be an issue for me. I tried setting "-lro up" in the options for the parent interface, then recreating my VLANs but didn't make any difference. 

Ryan Moeller 
June 12, 2019 at 3:24 PM

I think this may be an unresolved problem in FreeBSD, it's hard to say. If the issue persists, please consider reporting the problem at https://bugs.freebsd.org

Ryan Moeller 
April 26, 2019 at 11:55 PM

I suspect the issue may in fact be that LRO on the physical is not being disabled by the bridge because of the vlan in the middle. If you would like to try something, changing the physical interface options to `-lro up` will disable LRO before the vlan is created, which should work around the problem.

Thu Lle 
April 18, 2019 at 12:50 PM

Hi and thanks for taking a look!

Sorry for not following up on this. The IPMI interface died so I was unable to handle an ISO-install remotely. I've now virtualized freenas in KVM and will try to see if i can recreate the issue and whether the ISO-build helps.

Ryan Moeller 
February 21, 2019 at 7:23 PM

Thu Lle: Can you test with this build of FreeNAS? There are some changes in the network stack that I can take a closer look at for merging to 11.2 if your issue is resolved.
https://download.freenas.org/11.3/MASTER/201902211003/x64/FreeNAS-11.3-MASTER-201902211003-edf46a1.iso

Need additional information

Details

Assignee

Reporter

Components

Fix versions

Priority

More fields

Katalon Platform

Created January 25, 2019 at 11:51 AM
Updated July 1, 2022 at 4:26 PM
Resolved June 12, 2019 at 3:24 PM