Saturday, April 26, 2008

tcp window scaling, Linux, and broken routers

So, what the hell is TCP window scaling and why do I care?

This had me puzzled for a few days: solarguard.solarcity.com worked fine when I was working from my laptop from home, but didn't when I was at work. That's the Web site the monitoring data from my solar panels is sent to.

At first I thought they don't support path MTU discovery properly (e.g. by blocking most icmp packets, including "fragmentation required"). However, lowering the MTU on my workstation didn't help (but it briefly messed up NFS...)

A co-worker pointed out a the Web site doesn't support Linux, but works fine with Mac and Windows. Bullocks, I thought, silly browser requirements. However, sure enough, even a simple "GET / HTTP/1.0" sent via telnet from Linux doesn't work, but works fine from my Mac laptop. I was stumped.

Until I did these two tcpdumps of a telnet session to solarguard.solarcity.com.

21:09:37.486875 IP bbeck-mac.wlan.lostentry.org.50292 > sol6.solarcity.com.http: S 3232032404:3232032404(0) win 65535
21:09:37.502028 IP sol6.solarcity.com.http > bbeck-mac.wlan.lostentry.org.50292: S 3190318898:3190318898(0) ack 3232032405 win 16384
21:09:37.502098 IP bbeck-mac.wlan.lostentry.org.50292 > sol6.solarcity.com.http: . ack 1 win 65535
21:09:43.279950 IP bbeck-mac.wlan.lostentry.org.50292 > sol6.solarcity.com.http: P 1:17(16) ack 1 win 65535
21:09:43.476237 IP sol6.solarcity.com.http > bbeck-mac.wlan.lostentry.org.50292: . ack 17 win 65519
21:09:43.616030 IP bbeck-mac.wlan.lostentry.org.50292 > sol6.solarcity.com.http: P 17:19(2) ack 1 win 65535
21:09:43.632860 IP sol6.solarcity.com.http > bbeck-mac.wlan.lostentry.org.50292: FP 1:559(558) ack 19 win 65517
21:09:43.632967 IP bbeck-mac.wlan.lostentry.org.50292 > sol6.solarcity.com.http: . ack 560 win 65535
21:09:43.633327 IP bbeck-mac.wlan.lostentry.org.50292 > sol6.solarcity.com.http: F 19:19(0) ack 560 win 65535
21:09:43.650105 IP sol6.solarcity.com.http > bbeck-mac.wlan.lostentry.org.50292: . ack 20 win 65517

and

21:14:19.460778 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: S 3539793334:3539793334(0) win 5840
21:14:19.474075 IP sol6.solarcity.com.http > chef.lostentry.org.56146: S 3721548036:3721548036(0) ack 3539793335 win 16384
21:14:19.474105 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: . ack 1 win 365
21:14:24.092355 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: P 1:17(16) ack 1 win 365
21:14:24.238705 IP sol6.solarcity.com.http > chef.lostentry.org.56146: . ack 17 win 65519
21:14:24.844354 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: P 17:19(2) ack 1 win 365
21:14:30.238534 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: P 17:19(2) ack 1 win 365
21:14:30.263687 IP sol6.solarcity.com.http > chef.lostentry.org.56146: . ack 19 win 65517
21:14:30.263723 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: P 19:21(2) ack 1 win 365
21:14:30.476289 IP sol6.solarcity.com.http > chef.lostentry.org.56146: . ack 21 win 65515
21:15:59.668923 IP chef.lostentry.org.56146 > sol6.solarcity.com.http: F 21:21(0) ack 1 win 365
21:15:59.680890 IP sol6.solarcity.com.http > chef.lostentry.org.56146: R 3721548037:3721548037(0) win 0

What is up with the tcp window size in the Linux dump once the connection is established? 365, vs. 65535 on the Mac. And what does wscale do?

This made me suspicious and a quick visit to our favorite search engine brought up this most excellent blog entry, as well as a somewhat old article on LWN which, however, explains he basic technicalities quite well.

In a nutshell: TCP window scaling is used to work around limitations in TCP that restrict the maximum window size to 64kBytes, by negotiating a multiplier for the window value during session setup. Apparently, there are some broken routers out there that set the wscale option to 0, effectively transparently disabling window scaling without telling anybody. The result are TCP connections that appear to hang for data, but control traffic (like syn/ack, rst, retransmit, etc.) still works!

Sure enough, if I disable TCP window scaling on chef, the SolarGuard website sends the response just fine even in a telnet session, where it didn't work before. Setting the tcp_rmem and tcp_wmem values to the pre-2.6.17 values makes this work properly even with window scaling enabled.

I left window-scaling enabled, but manually set

sysctl -w net.ipv4.tcp_wmem="4096 16384 131072"
sysctl -w net.ipv4.tcp_rmem="4096 87380 174760"

and the Website works fine now.

As an aside, kudos to SolarCity staff. I called Support, they forwarded my problem to someone in Engineering, and I had a sensible answer in my inbox in less than 20 minutes after my initial problem report, followed up with a confirmation that they verified connectivity from multiple places and networks without problems. They even confirmed that my box is working properly.

Maybe they happen to have control over that old router that is not compliant with RFC 1323 from the year 1992, and either fix it, or take it out back and shoot it...

Update:

In Windows the registry key HKLM/tcpip/parameters/Tcp1323Opts is a bit field that controls whether tcp window scaling is enabled (1) or not (0). Setting this to 0 disbles tcp window scaling for both incoming and outgoing connections.

Windows Vista has tcp window scaling enabled by default (so it will have the same problems, that I had with Linux). It can be turned off using the network tuning wizard, or above registry key.

3 comments:

Anonymous said...

Can you check again. We have made some changes to our environment that may have resolved your issue.

Bernhard said...

I get different behaviour now. That could use more investigation when I'm awake.

sol6 sets wscale=0 in the options during initial negotiation. However, Linux on some systems still seems to use window scaling nonetheless.

It works on a Debian 4.0 based system with window scaling enabled, and only the rmem/wmem values set to pre-2.6.17 values. However, on an Ubuntu Dapper based system it works only when I explicitly disable tcp window scaling. So this is either buggy on the Linux side, or there is something else going on that I'm missing.

Anonymous said...

Setting it to 0 is wrong if scaling is not supported. The option field should be omited in that case.