Archive for August, 2007

Blocking Ads in Squid

August 29, 2007

One of the more bandwidth-intensive “features” of the Web is the proliferation of ad images and flash media which has a nasty habit of wasting bandwidth and increasing loading times.

Squid has been able to filter ads and other unwanted media for a number of years. Various articles have been written to cover how exactly its done and so I won’t bother covering the how-to here.

The original method involved the “redirector”. A redirector was simply an external program which would read in URLs on STDIN and spit out “alternate” URLs on STDOUT. This could be used for a number of things – the initial use being to rewrite URLs when using Squid as a web server accelerator – but people quickly realised they could rewrite “ad” URLs to filter them out.

Another method is to simply build a text file with identified ad content URLs and hostnames and simply deny the traffic. This is simple but can scale poorly if you try filtering thousands of URLs against regular expression matches.

Finally, another method involves using the more recent “external ACL” helper. It is an external program which can be passed a variety of information about a request (URL, client IP, authenticated username, arbitrary HTTP headers, ident to name a few, but its very customizable!) and spit back a YES or a NO, with an optional message. Content can then be filtered by simply denying access to it, but it currently doesn’t let you return modified content. One of the most popular uses of the external ACL helper is actually to implement ACL groups from sources like LDAP/Windows Active Directory.

How you do it is up to you. Here’s a few links explaining whats involved.

Web Cache Whitepapers/Articles

August 24, 2007

Why bother with Squid as a purely proxy server? Isn’t most of the content on the Internet today dynamic?

Perhaps; perhaps not. A few years ago “media caching” required licenced software to handle WMA and RealMedia streams; today the heavy bandwidth users are flash videos from popular sites such as YouTube. The HTML may not be cachable but all those thumbnail images, all those previews and all those large flash video files are very cachable. The problem isn’t that the Internet is “dynamic”; the problem is that website designers view caching as “evil” – they’re suddenly not 100% in control of their content – and try as hard as possible to dodge caching.

Squid has a few knobs which can be set to cache this so-called “dynamic” content. Squid has to treat everything which may be dynamic as uncacheable – the telltail “?” in the URL identifying the output as being from a script – when in fact the content isn’t all that dynamic. More on that will be covered in a future article.

ISPs who run Squid with a well-tuned configuration have shown web traffic savings of around 30%. Thats 30% of their traffic, not just hits. And thats not with any attempt at caching the “dynamic” content which can actually be cached – Youtube and Windows Updates are two big offenders here.

So Squid isn’t that useless at all!

A couple of articles which give an overview of caching follow. They’re dated – the technology isn’t new after all – and just as applicable today.

WebDAV tester wanted.

August 4, 2007

One of the IPv6 squid testers has reported strange errors with a WebDAV enabled squid3-ipv6 build. Unfortunately he had no time available to track these down, and I don’t have WebDAV capability setup for use or testing.

I am seeking someone who does have the time and setup to test WebDAV in squid under an IPv6 setup. I am willing to act as a free consultant in the IPv6 side of the setup in exchange for this testing if needed.