Archive for the ‘Squid’ Category

Chunked Decoding

April 29, 2008

We have been getting a growing number of reports and bugs from people using Squid 3.0 described as ‘squid producing a blank page’ when bypassing squid apparently works.

Sounds familiar to some yes? I’m bringing it up now because while it is an old problem, its not the TCP issues Adrian wrote about earlier and you should also check if you find its not this. Which incidentally can have exactly the same visible effects for end-users.

This ‘new’ issue is caused by certain widely-used web servers which shall remain nameless and unadvertised by me. Which always respond with HTTP/1.1 chunked-encoding of pages.

Servers are explicitly forbidden from sending that particular encoding type to software announcing itself as HTTP/1.0 (such as squid). But the broken server is doing it anyway!

Ironically: The authors use this server on their own help and support website. So those who are having this problem both see it as a squid problem, and can’t find or see any solution they may have posted anyway.

How to tell if this is your problem?

Use squidclient to make a web request that bypasses the squid proxy. It should send out the HTTP/1.0 request and get a page back. If the headers of the response include “Transfer-Encoding: chunked” there is your problem.

This is currently only an issue in Squid 2.5 or earlier and 3.0, which is still highly modeled around 2.5.

The solutions are varied depending on your capabilities.

Simplest for some will be to just bypass squid for those domains.

[ UPDATE: (thanks Michael Graham)

Apparently several people are having success with simply dropping the Accept-Encoding header to certain of these broken servers. Adding this to their squid.conf :

# Fix broken sites by removing Accept-Encoding header
acl broken dstdomain …
request_header_access Accept-Encoding deny broken

NP: don’t forget to remove it again when you upgrade out of 3.0

]

Next best is to use peer-routing to divert those domain requests at a squid 2.6 (or if you are feeling experimental a 3.1 build)

If its a serious issue and you are accelerating for one of these broken web servers. Then you will need to stick with Squid 2.6 until 3.1 is available for production use.

Why does it work for 2.6 and 3.1 but not 3.0?

Well, things are a bit messy I’ll have to write it up one day. Suffice to say that 3.1 has a lot more HTTP/1.1 support where the chunked-encoding/decoding was intended for. But 2.6 needed it a bit earlier so a version of the decoding (only!) was done to fit 2.6 needs at solving this same issue for high-performance users earlier last year.

The 3.0 code is just different enough that it would need a whole new back-port project to get it going well. The time and work that would take is being used instead to get 3.1 out faster. Which should be within a month of this writing so procrastinating could solve the problem for you.

[UPDATE: Thanks to the Gentoo Project for their work back-porting this will be available from 3.0.STABLE16-RC1 ]

Squid-2.6 + TPROXY + Debian

April 7, 2008

Jason Healy posted some useful information to the squid-users list a week or so ago.

Quoting:

I’ve been a happy user of Squid for the past 10 years or so, and I’d like to take a second to thank everyone who has worked so hard to make such a great piece of software!  I’d like to give back to the Squid community, but unfortunately I’m not much of a C hacker.  However, I’m hoping I can still help.

I’ve just spent a few days getting my school’s Squid install up to date (we were running 2.5 on Debian Woody).  I switched to using tproxy this time around (we used to do policy routing on our core, but it was spiking the CPU too much).  Thanks to the mailing list, some articles on the web, and a little messing around I was able to get the whole system up and running.  I’ve documented the steps here:

http://web.suffieldacademy.org/ils/netadmin/docs/software/squid/

The document is written for someone with a decent grasp of Linux, and is specifically geared to Debian Etch.  There are some tweaks that are pecific to our install (compile-time flags, mostly), but otherwise it’s pretty generic.  Hopefully, this will help someone else out who’s trying to build a similar system, so I’m posting so it will hit the archives.

Squid Updates – April 2008

April 6, 2008

University studies have begun for me and so my available time has been limited. But to summarise:

  • Squid-3.0 has been released, for people who are interested in playing with it
  • Kinkie has updated the Wiki theme in a big way – http://wiki.squid-cache.org/
  • Squid-3 development has migrated to bzr
  • Alex is looking to merge in the first set of eCAP related changes into Squid-3.HEAD
  • Squid-2.7 is on track to be released – there’s one outstanding bug and its unfortunately difficult to fix. http://www.squid-cache.org/bugs/show_bug.cgi?id=2160 is the bug to watch.
  • Funded Squid-2 development will continue for the time being; mostly from projects I’m working on. We’ll see how things progress there. The Squid-2 Roadmap is slowly changing, evolving and being completed.

Squid-2 performance work: graph #1

January 23, 2008

 

Whats going on with Squid-2 and Squid-3 ?

January 10, 2008

A few people have asked me what the deal is with Squid-2 and Squid-3.

“Why are you developing on Squid-2 when Squid-3 is now out?”

“Should I upgrade to Squid-3 now that its released?”

I’m focusing on Squid-2 for a few reasons, namely:

  • Its what people running high-traffic sites are currently running, and Squid-3 doesn’t work at all for them;
  • I was fed up waiting for Squid-3 to be released and for it to become mature enough for users to migrate to before I started my performance work. I gave up about 12 months ago and began planning out the work thats currently going on.
  • I’m personally much more familiar with the Squid-2 codebase than the Squid-3 codebase.

So what exactly am I doing to Squid-2? Well, I’m doing all the things to Squid-2 which I personally believe we should’ve done in the C++ Squid-3 branch before all the “new stuff” was added. You can find it all at http://devel.squid-cache.org/changesets/squid/s27_adri.html . A summary of what I’m doing in this first round:

  • I’m taking a very sharp scalpel to the codebase and removing all of the extra data copies and buffering which is going on;
  • I’m reworking the buffer management so arbitrary sized data buffers can be used, rather than fixed 4k buffers for network/disk traffic;
  • I’m reworking the Strings interface to use reference counting and reference underlying buffers, saving on memcpy() and malloc() calls, cutting down on the amount of transient memory used to handle requests and dropping the CPU and memory bus utilisation quite dramatically;
  • I’m reworking the dataflow between server->store and store->client to use the above reference counted buffers, so data isn’t memcpy()’ed between layers, again dropping CPU and memory bus utilisation;
  • And I’m going to break out as much of the code into external libraries with well-understood dependencies, as preparation for documentation, unit testing and further profiling.

My aim is to fix whatever bugs show up in Squid-2.7 and then in Squid-2.HEAD (which has some of the above included already.) I’ll then start bringing across my changes as they’ve been tested and been found stable. My aim is to have the bulk of the above done within the next month or so and get it into Squid-2.HEAD and concentrate on making it stable before I continue tidying up the dataflow and restructuring the ugly bits of code.

Whats this mean for Squid-3? The Squid-3 guys are doing some great work with things such as ICAP and IPv6 and I hope that they’ll gain more experience with their codebase over the next 12 months or so. I’m certainly not bringing ICAP support into Squid-2 until I’ve reworked the dataflow and tidied up the code enough for ICAP to sit comfortably in the data pipeline, rather than have it bolted onto the side and hooking into strange places where it shouldn’t. (I may bring in IPv6 into Squid-2 soon though!)

Hopefully my work and their work will culminate with the development of the next Squid major version over the next 12 to 24 months. There’s a long way to go though and my main aim here is to get faster, better and shinier code out to the majority of Squid users now so they can benefit from the development, rather than repeating the 4-odd year gap between Squid-2.5 and Squid-2.6. Users hated that.

So whats it mean for you?

  • If you want to try out Squid-3; if you want supported ICAP services then try it out.
  • Squid-2.X will continue being developed over the next 12 months as time permits, so don’t feel like you have to move to Squid-3.
  • If you feel adventurous, try out Squid-2.7. Initial reports are that its stable and slightly less CPU intensive.
  • Squid-2.7 is the first version to include changes to allow Youtube and Microsoft Updates caching. It doesn’t do it out of the box, but the support is there, and I’ll be publishing test rules soon to let people start caching this stuff.
  • If you feel really adventurous then try out Squid-2.HEAD and report back if you have any issues. It should be even less CPU intensive, but only under certain workloads.

Please upgrade to Squid-2.6.STABLE18

January 10, 2008

Squid-2.6.STABLE18 fixes a silly bug (thanks to yours truely fixing another bug!) which may cause your Squid to crash under certain circumstances.

Squid-2.6.STABLE18-RC1 (release candidate 1) tarballs are available from the Squid website – http://www.squid-cache.org/Versions/v2/2.6/ – the release should be in a day or two.

Squid-2.7 Branched; performance work has begun!

December 22, 2007

Henrik has branched Squid-2.7 – it hasn’t been formally announced yet but it should be any day now.I’ve begun rolling in infrastructure changes with an eye towards improved performance in Squid. Squid-2 is my testbed at the moment – I’m leaving Squid-3 alone for now to let the codebase mature and the C++ guys to, well, do their C++ “thing”. The first round of patches to Squid-2.HEAD remove one of the major CPU and memory bottlenecks – memcpy()’ing of data as it passes from the store (so from anywhere, really) back to the client. This may or may not improve performance with your workload but its the beginning of sensible dataflow inside Squid.(I estimate this brings Squid up to the late 90’s in terms of network application coding..)My next trick will be reference counted buffers and strings, to avoid more memcpy()ies, memory allocation/frees, and general L2 cache busting. More on that later. 

Squid-3.0.STABLE1 released

December 22, 2007

Its been a long wait, but Duane has released Squid-3.0.STABLE1. Features include integrated ICAP support. You can find more information at the release website

IPv6 going mainstream in squid

December 17, 2007

Well folks, things are getting underway again just in time for the new year.

Starting with the Dec 16th daily snapshot of squid3-HEAD includes the long-awaited squid3-ipv6 branch of squid.

http://www.squid-cache.org/Versions/v3/HEAD/

To build the feature just add –enable-ipv6 to your configure options. There are other IPv6 settings for some setups, but most will not need them. Expect it to accept your existing 3.0 squid.conf while allowing you to tweak it slightly for IPv6 purposes if you have a v6/NG connection or desire to do so.

The new releases coupled with an IPv6 link as simple as a single-host tunnel add the ability to:

* source traffic from either IPv4 or IPv6 as needed or provided

* proxy web traffic between IPv4 and IPv6 seamlessly

* gateway an IPv4 or IPv6 -native network to the full transitioning web

* accelerate a website on both IPv4 and IPv6 Internets even if the web server itself is stuck without access to one protocol.

* measure network availbility over both IPv4 and IPv6 for peers and source selection

Some expected configuration problems and their solutions can be found in the Squid wiki FAQ

http://wiki.squid-cache.org/SquidFaq/ConfiguringSquid

How cachable is google (part 2) – Youtube content

November 17, 2007

Youtube is (one of) the bane of small-upstream network administrators. The flash files are megabytes in size, and a popular video can be downloaded by half the people in the office or student residential college in one afternoon.

It is, at the present time, very difficult to cache. Lets see why.

There’s actually two different methods employed to serve the actual flash media files that I’ve seen. The first method involves fetching from youtube.com servers; the second involves fetching from IP addresses in Google IP space.

The first method is very simple: the URL form is:

http://XXX-YYY.XXX.youtube.com/get_video?video_id=VIDEO_ID

XXX is the pop name; YYY is I’m guessing either a server or a cluster name.

This is pretty standard stuff – and If-Modified-Since requests seem to also be handled badly too! The query string “?” in the URL makes it uncachable to Squid by default, even though its a flash video. Its probably not going to change very often.

The second method involves a bit more work. First the video is requested from a google server. This server then issues a HTTP 302 reply pointing the content at a changing IP address. This request looks somewhat like this:

http://74.125.15.83/get_video?video_id=HrLFb47QHi0&origin=dal-v37.dal.youtube.com

Again, the “?” query string. Again, the origin, but its encoded in the URL. Finally, not only are If-Modified-Since requests not handled correctly, the replies include ETags and requests with an If-None-Match revalidation still return the whole object! Aiee!

So how to cache it?

Firstly, you have to try and cache replies with a “?” reply. It would be nice if they handled If-Modified-Since and If-None-Match requests correctly when the object hasn’t been modified – revalidation is cheap and its basically free bandwidth. They could set the revalidation to be, say, after even 30 minutes – they’re already handling all the full requests for all the content, so the request rate would stay the same but the bandwidth requirements should drop.

The URLs also have to rewritten, much like they do to cache google maps content. The “canonical” form URL will then reference a “video” regardless of which server the client is asking.

Now, how do you do this in Squid? I’ve got some beta code to do this and its in the Squid-2 development tree. Take a look here for some background information. It works around the multiple-URL-referencing-same-file problem but it won’t unfortunately work around their broken HTTP/1.1 validation code. If they fixed that then Youtube may become something which network administrators stop asking to filter.

(ObNote: the second method uses lighttpd as the serving software; and it replies with a HTTP/1.1 reply regardless of whether the request was HTTP/1.0 or HTTP/1.1. Grr!)