One of the more bandwidth-intensive “features” of the Web is the proliferation of ad images and flash media which has a nasty habit of wasting bandwidth and increasing loading times.
Squid has been able to filter ads and other unwanted media for a number of years. Various articles have been written to cover how exactly its done and so I won’t bother covering the how-to here.
The original method involved the “redirector”. A redirector was simply an external program which would read in URLs on STDIN and spit out “alternate” URLs on STDOUT. This could be used for a number of things – the initial use being to rewrite URLs when using Squid as a web server accelerator – but people quickly realised they could rewrite “ad” URLs to filter them out.
Another method is to simply build a text file with identified ad content URLs and hostnames and simply deny the traffic. This is simple but can scale poorly if you try filtering thousands of URLs against regular expression matches.
Finally, another method involves using the more recent “external ACL” helper. It is an external program which can be passed a variety of information about a request (URL, client IP, authenticated username, arbitrary HTTP headers, ident to name a few, but its very customizable!) and spit back a YES or a NO, with an optional message. Content can then be filtered by simply denying access to it, but it currently doesn’t let you return modified content. One of the most popular uses of the external ACL helper is actually to implement ACL groups from sources like LDAP/Windows Active Directory.
How you do it is up to you. Here’s a few links explaining whats involved.