People occasionally ask why site X doesn’t work through their Squid proxy but it does when going direct. Almost every one of those problems can be explained by three phenomena:
- ECN (Explicit Congestion Notification)
- PMTUD (Path MTU Discovery)
- TCP Window Scaling
Explicit Congestion Notification is a method for a router or gateway to signal end-points of a TCP connection that the path is reaching congestion and they should back off a little. ECN uses two bits from the TCP ToS byte to identify if the endpoints are ECN capable and to signal congestion. Unfortunately many older firewalls developed before ECN became main-stream do a variety of silly things when faced with ECN bits – the worst behaviour being simply dropping the packet on the floor. The pre-ECN specification noted those bits were “Not specified” but various firewall vendors took that to mean “must both be zero” and treat non-zero bits there as “invalid traffic.”
Disabling ECN is platform specific and is an operating system parameter – Squid doesn’t have the ability to disable ECN for specific connections. Just use Google to determine how to disable ECN.
Path MTU discovery is a muddy issue. PMTUD is a method for determining when a router/gateway between two IP endpoints has an MTU smaller than the endpoints. For example – if a home PC and server has an MTU of 1500 but the ISP broadband has an MTU of 1480 (as they’re tunneling broadband traffic around their network.) PMTUD specifies that hosts should set the “Don’t Fragment” bit on traffic and routers/gateways should then return an ICMP “Fragmentation Required” message when faced with a packet too large to forward. The sending host receives the ICMP packet, drops its MTU for that IP destination, and resends the packet.
This requires a number of things work:
Firstly, that the router/gateway actually sends the ICMP packet (which doesn’t always happen for various reasons – one of the more popular is a slightly misconfigured network resulting in MTU mismatches at Layer 2; if a packet is dropped by a Layer 2 MTU mismatch no ICMP (which is Layer 3/4) will be generated.)
- Secondly, that ICMP is not filtered. Unfortunately a number of firewalls and servers have been configured treat ALL ICMP as hostile and drop the traffic. ICMP is a part of the IP infrastructure and is necessary for correct behaviour, but this doesn’t seem to have penetrated the heads of various firewall administrators of various large websites.
Resolving issues relating to PMTUD can be a bit difficult. The correct thing to do is to ensure you allow ICMP to/from your proxy servers, clients and origin servers (which holds true for reverse and forward proxies.) If you have to filter ICMP then filter out the potentially hazardous stuff (echo request/reply, port/host unreachable) but leave the important infrastructure stuff (like “fragmentation required.”) If you’re that stuck, consider dropping the default TCP MSS on your server.
Dropping the TCP MSS is definitely not changing the MTU of the interface. The MTU of your interface(s) should be the same as all other hosts on that network. Most operating systems have the ability to add routes which override the MSS (Maximum Segment Size) for all connections to/from that particular host.
Under Linux you can use the “mss” route flag to do this; eg
“route add default gw X.X.X.X mss 1200”
Note that the MSS is not the same as the IP payload. The MSS is the TCP payload and options. For example, if you have a 1500 byte TCP frame, that will be 20 bytes IP header, 20 bytes TCP header, (say) 20 bytes TCP options and the rest (1440 bytes) for payload. The MSS is the payload (1440 bytes) and the options (20 bytes) = 1460 bytes.
You could just turn off PMTUD and hope that the routers in the path will fragment the packets for you. I don’t do this so I’m not sure how, or how effective this will be.
Finally, TCP Window Scaling is a TCP option which allows TCP to transmit in units larger than the default window size (which is a 16 bit number, so 65535 bytes.) No, this doesn’t mean the packet is 65535 bytes in size; it means it’ll transmit 65535 bytes of data in MSS sized segments before waiting for an ACK.
The TCP WSS option tells the TCP stack how many bits to shift the window size “right” to calculate the true window size. A WSS of 0 shifts the window size to the right by 0 bits, giving the same as normal. A WSS of 1 shifts the window size to the right by 1 bits, giving you a real window size of up to 131072 bytes in 2 byte increments. A WSS of 2 shifts the window size by 2 bits, giving you a window size of 262144 in 4 byte increments, and so on.
This is done so higher TCP throughputs are possible over higher latency links. Higher latency being more than a handful of milliseconds (say 30ms.) Ie, most of the world.
Unfortunately, firewalls strike again. Some firewalls don’t understand the TCP WSS option and will do the most brain damaged thing possible – they’ll simply zero the whole WSS option. This has a horrible side-effect: both sides have sent TCP WSS options, and see the other side sending a TCP WSS option (of 0!), so they assume the option they sent is perfectly fine to use. Unfortunately they send an option of non-zero, and expect the window sizes in TCP frames to be interpreted with an option of zero, and bad things happen. This mostly shows up as slow/stalling TCP connections.
For now, the best thing to do is simply disable TCP WSS. Again, its operating system dependant and you’ll want to use Google to find out how to do it. You can use route flags in some operating systems (such as Linux) to set the maximum TCP window size for destinations if you wish to fine-tune things.