Archive for the ‘Uncategorized’ Category

Language Negotiation and the world-wide-Squid

September 30, 2009

From 3.1 Squid now supports Automatic Language Negotiation.  There seems to be a little bit of confusion over what this means and what should be configured.

Obviously we would like people to enable and use the automatics. For some very good reasons which you shall understand at the end of this post. I would hope you agree by then too.

Most software you and the rest of the world will be familiar with comes in two  forms: English, or translated into your own language. You might have your computer set to  non-English language and all the software that can changes text so you can more easily read it.

All of this is very you-centric and only affects whatever machine you are using. The www is a very different beast altogether. It has to deal with everyone. At the same time too.

The best example is search engine results. You may have noticed when you do a search that some results have little tags. cached, similar pages, more, … and sometimes one called ‘translate’.  This is nice, because it means the search engine has noticed that the page is in a language you may not know and its offering a link that will translate the page to one you can read.

Ever wondered ‘how does it know’? and more importantly;  what does all this have to do with Squid?

Lets start with the second one:  What does this have to do with Squid?  well Squid. The one I run, the one you probably run, and many others around the world generate error pages.  You are sure to have seen the “404 Not Found” at some point. Probably “Access Denied” and “Connection Failed” as well.

Until now Squid has been setup and managed by someone for a specific purpose. That person sets the language those pages are displaying to something they can read and see what problems are. And here is where the confusion seems to start.

One admin who setup the new Squid promptly changed the error_directory language to German (de). Quite rightly he thought. I’m German, my customers are German, who needs any other languages installed? It will only confuse me to see other language errors. And the server is set to German so it won’t show any others anyway.

At this point I’m guessing you might agree with some or all of that assumption. For your language in the same situation, you would probably do the same yes?

Lets take a look at that search engine question. We found a website. It is written strangely in Persian. We do not have a clue whats its about. Clicking on the ‘translate’ link and we read the page.

But wait, …

… we only saw one single ‘translate’ link and surely the engine knows many languages. We should see a whole bunch, one for every language the page might be translated into.

This is where we get closer to Squid again. The HTTP protocol has a header where the browser says what languages its current user would like things displayed in. The search engine is reading that header and only showing the translate link for most prefered language it can cope with.

This is precisely what Squid now does for the error pages it creates. The language displayed depends on the visitor doing the reading when the automatics are allowed to run.  The server Squid runs on has nothing to do with the language.

Our German admin if you recall set the error_directory to German so he could read it.

Too bad for us if you or I non-German readers had a problem getting to one of his customers websites. Or if we were visiting one of his customers and using their Internet access from our laptop.

What he should have done was leave error_directory unset. When he visits the proxy to test a problem it shows german language, because has browser says to. The user who reported the problem might be reading the same message in Chinese, or Korean.

Squid provides error pages for two reasons, to explain whats gone wrong, and to explain to someone what to do about the problem.  In this world of many international people your visitors and users could be coming from any kind of background with any kind of language needs. To help reduce the number of strange language half-understood complaints we all receive the Squid team have made Squid explain things in a language the visitor can read, so you don’t have to. All you have to do is turn it on.

http://wiki.squid-cache.org/Translations#What_has_been_done.3F

Squid now speaks in over 130 national languages and dialects. 100 more than this same time just last year. Some are more complete than others, improving all the time.

Kia Ora koe.

Continuous Integration

August 18, 2009

For the last few years there has been a slow growing improvement to the testing and QA Squid is subject to. This last week has seen the construction and rollout  of a full-scale build farm to replace some of our simple internal testing. Robert Collins covers the growth process in his blog.

Here is the initial release notice:

Hi, a few of us dev’s have been working on getting a build-test environment up and running. We’re still doing fine tuning on it but the basic facility is working.

We’d love it if users of squid, both individuals and corporates, would consider contributing a test machine to the buildfarm.

The build farm is at http://build.squid-cache.org/ with docs about it at http://wiki.squid-cache.org/BuildFarm.

What we’d like is to have enough machines that are available to run test builds, that we can avoid having last-minute scrambles to fix things at releases.

If you have some spare bandwidth and CPU cycles you can easily volunteer.

We don’t need test slaves to be on all the time – if they aren’t on they won’t run tests, but they will when the come on. We’d prefer machines that are always on over some-times on.

We only do test builds on volunteer machines after a ‘master’ job has passed on the main server. This avoids using resources up when something is clearly busted in the main source code.

Each version of squid we test takes about 150MB on disk when idle, and when a test is going on up to twice that (because of the build test scripts).

We currently test:

  • 2.HEAD
  • 3.0
  • 3.1
  • 3.HEAD

I suspect we’ll add 2.7 to that list. So I guess we’ll use abut 750MB of disk if a given slave is testing all those versions.

Hudson, our build test software, can balance out the machines though – if we have two identical platforms they will each get some of the builds to test.

So, if your favorite operating system is not currently represented in the build farm, please let us know – drop a mail here or to noc @ squid-cache.org – we’ll be delighted to hear from you, and it will help ensure that squid is building well on your OS!

-Rob

That just about covers everything. Hardware and build software requirements are listed in the build farm page.

Hi, a few of us dev's have been working on getting a build-test
environment up and running. We're still doing fine tuning on it but the
basic facility is working.

We'd love it if users of squid, both individuals and corporates, would
consider contributing a test machine to the buildfarm.

The build farm is at http://build.squid-cache.org/ with docs about it at
http://wiki.squid-cache.org/BuildFarm.

What we'd like is to have enough machines that are available to run test
builds, that we can avoid having last-minute scrambles to fix things at
releases.

If you have some spare bandwidth and CPU cycles you can easily
volunteer. 

We don't need test slaves to be on all the time - if they aren't on they
won't run tests, but they will when the come on. We'd prefer machines
that are always on over some-times on.

We only do test builds on volunteer machines after a 'master' job has
passed on the main server. This avoids using resources up when something
is clearly busted in the main source code.

Each version of squid we test takes about 150MB on disk when idle, and
when a test is going on up to twice that (because of the build test
scripts).

We currently test
2.HEAD
3.0
3.1
3.HEAD

and I suspect we'll add 2.7 to that list. So I guess we'll use abut
750MB of disk if a given slave is testing all those versions.

Hudson, our build test software, can balance out the machines though -
if we have two identical platforms they will each get some of the builds
to test.

So, if your favorite operating system is not currently represented in
the build farm, please let us know - drop a mail here or to noc @
squid-cache.org - we'll be delighted to hear from you, and it will help
ensure that squid is building well on your OS!

-Rob

How cachable is google (part 2) – Youtube content

November 17, 2007

Youtube is (one of) the bane of small-upstream network administrators. The flash files are megabytes in size, and a popular video can be downloaded by half the people in the office or student residential college in one afternoon.

It is, at the present time, very difficult to cache. Lets see why.

There’s actually two different methods employed to serve the actual flash media files that I’ve seen. The first method involves fetching from youtube.com servers; the second involves fetching from IP addresses in Google IP space.

The first method is very simple: the URL form is:

http://XXX-YYY.XXX.youtube.com/get_video?video_id=VIDEO_ID

XXX is the pop name; YYY is I’m guessing either a server or a cluster name.

This is pretty standard stuff – and If-Modified-Since requests seem to also be handled badly too! The query string “?” in the URL makes it uncachable to Squid by default, even though its a flash video. Its probably not going to change very often.

The second method involves a bit more work. First the video is requested from a google server. This server then issues a HTTP 302 reply pointing the content at a changing IP address. This request looks somewhat like this:

http://74.125.15.83/get_video?video_id=HrLFb47QHi0&origin=dal-v37.dal.youtube.com

Again, the “?” query string. Again, the origin, but its encoded in the URL. Finally, not only are If-Modified-Since requests not handled correctly, the replies include ETags and requests with an If-None-Match revalidation still return the whole object! Aiee!

So how to cache it?

Firstly, you have to try and cache replies with a “?” reply. It would be nice if they handled If-Modified-Since and If-None-Match requests correctly when the object hasn’t been modified – revalidation is cheap and its basically free bandwidth. They could set the revalidation to be, say, after even 30 minutes – they’re already handling all the full requests for all the content, so the request rate would stay the same but the bandwidth requirements should drop.

The URLs also have to rewritten, much like they do to cache google maps content. The “canonical” form URL will then reference a “video” regardless of which server the client is asking.

Now, how do you do this in Squid? I’ve got some beta code to do this and its in the Squid-2 development tree. Take a look here for some background information. It works around the multiple-URL-referencing-same-file problem but it won’t unfortunately work around their broken HTTP/1.1 validation code. If they fixed that then Youtube may become something which network administrators stop asking to filter.

(ObNote: the second method uses lighttpd as the serving software; and it replies with a HTTP/1.1 reply regardless of whether the request was HTTP/1.0 or HTTP/1.1. Grr!)

Web Cache Whitepapers/Articles

August 24, 2007

Why bother with Squid as a purely proxy server? Isn’t most of the content on the Internet today dynamic?

Perhaps; perhaps not. A few years ago “media caching” required licenced software to handle WMA and RealMedia streams; today the heavy bandwidth users are flash videos from popular sites such as YouTube. The HTML may not be cachable but all those thumbnail images, all those previews and all those large flash video files are very cachable. The problem isn’t that the Internet is “dynamic”; the problem is that website designers view caching as “evil” – they’re suddenly not 100% in control of their content – and try as hard as possible to dodge caching.

Squid has a few knobs which can be set to cache this so-called “dynamic” content. Squid has to treat everything which may be dynamic as uncacheable – the telltail “?” in the URL identifying the output as being from a script – when in fact the content isn’t all that dynamic. More on that will be covered in a future article.

ISPs who run Squid with a well-tuned configuration have shown web traffic savings of around 30%. Thats 30% of their traffic, not just hits. And thats not with any attempt at caching the “dynamic” content which can actually be cached – Youtube and Windows Updates are two big offenders here.

So Squid isn’t that useless at all!

A couple of articles which give an overview of caching follow. They’re dated – the technology isn’t new after all – and just as applicable today.

Further Info on IPv6 – Where the official site actually is…

July 3, 2007

Since people seem to be redirected here in preference to the official pages on the squid IPv6 branch. I think its about time I made some quick references back there so all of you trying to use this wonderful branch can find the actual code and know how to do so.

The IPv6 work in squid is all currently documented at http://devel.squid-cache.org/squid3-ipv6/ and related pages. My contacts, or those of any developer is kept on to maintain it should be referenced from there.

How-To’s, configuration, patches, etc, etc, ‘all the guff’ as they say, will be available there shortly as well.