#15256 closed defect (fixed)
virtualbox nat doesn't obey mss parameter
回報者: | aojea | 負責人: | |
---|---|---|---|
元件: | network/NAT | 版本: | VirtualBox 5.0.16 |
關鍵字: | nat, mss, mtu | 副本: | |
Guest type: | all | Host type: | all |
描述
Versions 4.X and 5.X of Virtualbox, when using the NAT interface, doesn't obey the mss of the VM and change tthe MSS in the TCP handshake.
To reproduce you only need to modify the MTU of the interface in the VM and do a curl or start other TCP connection:
root@vagrant-ubuntu-trusty-64:~# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1000 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:19:c3:bf brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0 valid_lft forever preferred_lft forever
As you can see in the capture inside the VM the SYN has an MSS of 960 meanwhile the answer has 1460
21:25:08.128132 08:00:27:19:c3:bf > 52:54:00:12:35:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 36522, offset 0, flags [DF], proto TCP (6), length 60) 10.0.2.15.48438 > 216.58.201.131.80: Flags [S], cksum 0xadfb (incorrect -> 0xba07), seq 1793847297, win 19200, options [mss 960,sackOK,TS val 32427 ecr 0,nop,wscale 7], length 0 21:25:08.138613 52:54:00:12:35:02 > 08:00:27:19:c3:bf, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 796, offset 0, flags [none], proto TCP (6), length 44) 216.58.201.131.80 > 10.0.2.15.48438: Flags [S.], cksum 0xb496 (correct), seq 54336001, ack 1793847298, win 65535, options [mss 1460], length 0 21:25:08.138644 08:00:27:19:c3:bf > 52:54:00:12:35:02, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 64, id 36523, offset 0, flags [DF], proto TCP (6), length 40)
The capture from the host reveals that the output packet has MSS 1460 in both connections, thus the most probable is that Virtualbox NAT is modifying the MSS. (the capture has different times, but is the same behaviour always)
17:48:42.795919 Out fc:aa:14:93:94:b0 ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 64, id 51370, offset 0, flags [DF], proto TCP (6), length 60) 192.168.30.31.39163 > 216.58.201.131.80: Flags [S], cksum 0x80b4 (incorrect -> 0x77f4), seq 3898897121, win 29200, options [mss 1460,sackOK,TS val 781761099 ecr 0,nop,wscale 7], length 0 17:48:42.806111 In 84:18:88:78:9a:42 ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 54, id 62615, offset 0, flags [none], proto TCP (6), length 60) 216.58.201.131.80 > 192.168.30.31.39163: Flags [S.], cksum 0xa1e9 (correct), seq 4238402083, ack 3898897122, win 42540, options [mss 1430,sackOK,TS val 172144884 ecr 781761099,nop,wscale 7], length 0
This is not a problem with one VM, but if you use more complex topologies, we are using to emulate an openstack cloud, you have problems because the packets are bigger than the mtu and are dropped in the vms
更動歷史 (9)
comment:2 4 年 前 由 編輯
I created a new ticket #20009 that is related. I attached vb.log file and some PCAP files that show the behavior.
跟進: 4 comment:3 4 年 前 由 編輯
Those captures show different MSS options which is obvious an expected, but is it really a problem?
however think that the behavior of test 3 is incorrect because it increases the packet size received by the guest which may cause problems on the guest itself or in a virtualized network.
Do you have evidence of this? As I said in the previous comment (and note how the request for captures for specifically to show this particular scenario):
Slirp [...] shouldn't send large packets to the guest that advertises small MSS. If that doesn't happen, it's a bug, but in that case I'd like to see the packet captures.
To reiterate:
The guest says: "my MSS is N, please don't send me packets larger than that". This affects the host->guest direction (and only that direction) of the host<->guest connection. The host establishes host<->peer proxy connection to the intended destination. Host's stack chooses appropriate MSS for that connection. That value bears absolutely no relation to the host<->guest connection.
As I tried to explain in that previous comment the "because" in "because it increases the packet size received by the guest" is wrong. There's no causal connection there.
跟進: 5 comment:4 4 年 前 由 編輯
Replying to vushakov:
Do you have evidence of this?
Sorry, sloppy wording on my part. Do you have any evidence that host actually sends to the guest packets that exceed the MSS advertised by the guest.
You conclude that that might happen, but I hope that the quotes in my previous comments show that that conclusion doesn't really follow.
comment:5 4 年 前 由 編輯
My initial message was wrong because I did not know that VirtualBox was considering guest-host communications and host-peer communications separately. I now understand that it is modifying network traffic (including MSS) and thus adapt itself to the MTU of the host. The inconsistent MSS that I observe is thus expected because MTU are different between guest-host and host-peer communications.
I however think that there is another problem in the traces I uploaded. It looks like the host send packets to the guest that do not respect its MSS (see explanations below). The 6th packet in the host_eth0_an.dump network traffic trace is sent from the peer to the host. It has a size of 1514 bytes. The host then send a packet of 1474 bytes to guest (cf 6th packet in guest_eth0_an.dump). I think that this packet size is not consistent with a MSS of 1360. Would you agree with that analysis? If yes, the problem seems to be different than my ticket and this one. In that case, should I open a new ticket?
Thank you very much for your time.
comment:6 4 年 前 由 編輯
Ah, ok, I see it. Slirp is a very old fork of a very old BSD TCP stack and it was forked before this was fixed. Stevens' admonition was likely about the BSD stack to begin with. From a quick look e.g. NetBSD fixed this only in 1997.
Still it's strange, b/c the old code really should have "negotiated" the smaller value advertised by the guest from the looks of it. I will need to debug this properly.
Thanks for providing the evidence. There's no need to file a new bug.
comment:8 4 年 前 由 編輯
This should be fixed in the test builds and will be in the next 6.1.x release. As far as I can tell the bug - ignoring the MSS advertised by the guest when sending packets to the guest - was introduced in the original slirp code base when it adapted the BSD TCP/IP stack for its new mode of operation. The impact of that bug is likely to be extremely limited and shouldn't have affected any practical scenarios.
The part of comment:1 concerned with the host<->peer connection is something completely separate (just for the record).
A bit of nitpicking first:
This sentence (taken in isolation for the sake of argument) seems to imply that
1460
is incorrect. However Stevens begins discussion of this option with:and RFC879:
While MTU is usually the primary factor, nothing in principle precludes an implementation from using MSS to e.g. advertise a smaller-than-MTU internal buffer it is capable of handling (e.g. a 1K-page sized packet buffers used by a 16-bit implementation).
The next thing that I think needs to be mentioned explicitly is that "NAT" is an unfortunate misnomer (historical reasons). It's not a real NAT, it's better described as automagic socks-like proxy. It uses host sockets. Now, there is
TCP_MAXSEG
socket option that can be set for outgoing connections to the value advertised by the guest in its SYN packet. However there is, as far as I know, no way to retrieve the MSS option advertised by the peer, so it cannot be propagated to the guest.In any case the point is moot as the guest is not actually talking to the tcp stack of its peer, but to the tcp stack of the "NAT". Host should take care of never sending larger packets to the peer if peer advertises small MSS. Slirp too - it shouldn't send large packets to the guest that advertises small MSS. If that doesn't happen, it's a bug, but in that case I'd like to see the packet captures.