Wednesday, November 26, 2008

Large sites without persistent connections

Sort of as a follow-up to the last article, over the last couple of days there were some particularly interesting sites that are just failing horribly:

Verizon's DNS error hijack - Verizon recently decided to start monetizing fat-fingered urls by taking over DNS failures and redirecting them to a search page. I'm using their default DNS servers (for now) for the online pagetest so some tests of invalid urls have returned the Verizon search page which ends up being a great example of a page that should be WAY faster than it is. Here are the full results of a test run I kicked off on purpose: http://performance.webpagetest.org:8080/result/PX8/

Here is the waterfall:


They fail pretty horribly for not using persistent connections but it's also a perfect example for image sprites and the html and css aren't gzipped either. The whole thing really should have been done in 2 or 3 requests and could be completely loading in under a second instead of a little over 3. None of those images have an expires header either so even repeat views take just as long.

Yahoo Japan - This one is mostly interesting because Yahoo is notoriously good at site optimization but somehow they seem to have missed the Japanese portal. They do pretty good on everything except for the persistent connections and gzip. Here are the full results: http://performance.webpagetest.org:8080/result/PXA/

and the waterfall:


That js is particularly bad as it's 85KB but can be reduced to 22 with gzip but the biggest crime is the persistent connections. They could cut the load time almost in half just by enabling persistent connections.

Tuesday, November 25, 2008

Easy ways to speed up your site

Pagetest has been online for 8 months now with close to 26,000 pages tested. I generally look through the test log daily and see what the results look like and it's been pretty frustrating because a significant number of the sites being tested could easily be twice as fast without changing the content or code but I have no way to reach out to the owner and tell them. The test results are probably a bit overwhelming for most people and they don't know where to start. so, with that.....


If you do nothing else for performance make sure you at least do these:


Persistent Connections - There is absolutely no reason that a site should not be using persistent connections yet I see them come through testing every day. Assuming you are not using a CDN and most of the content for your page is served from your server you can get close to a 50% improvement in performance just by enabling them. Here is a sample from a test that was run recently (site chopped out to protect the innocent):


I cropped it down but the whole site continued, opening new connections for 102 requests. The orange section of each request is the time used to open the connection and all but 2 would be eliminated just by enabling persistent connections. In this case the "start render" time (time when the user first sees something on the page) would go from 3.8 seconds down to roughly 2 seconds and the time to fully load the page would go down from 16 seconds to closer to 9 seconds. This sample was even from an Apache server so there's really no reason for not having them enabled.

GZIP (aka Free Money) - Just like with the persistent connections, enabling GZIP for text responses (html, css, js) is literally just a server configuration and there is very little reason for not doing it. Early versions of IE didn't react well to JS being compressed but it's easy enough to exclude them and not a good enough reason to penalize everyone else.

GZIP not only helps the end user get the pages faster but it also saves you significant bytes and if you pay for your network bandwidth, not having it enabled is throwing away free money.

It's not just the small sites that have this problem either. My favorite example is www.cnn.com which serves 523KB of uncompressed text. They could save 358KB of that just by enabling gzip compression and since most of the text for a page is all in the base page or js and css referenced in the head it all has to get downloaded before the user sees anything:

The blue bars in the waterfall are where the browser isi downloading uncompressed text for CNN.

Combining CSS and JS files - In case it's not obvious from the two waterfalls shown so far, it's not uncommon for a site to reference several different js and css files. There are several solutions available that will let you keep the files separate but download them all in a single request. David Artz has a pretty good write-up on mod_concat and some of the other options here. This does require modifying the html to reference the combined URL instead of each file individually but that is a very worthwhile effort given the massive improvement in the loading of your site.

The JS and CSS files usually get downloaded before anything is shown to the user and JS in particular has a nasty habit of only being downloaded one at a time so anything you can do to reduce the number of each that you are loading will have a huge impact to the user.

Saturday, September 13, 2008

Spammers and Script Kiddies

Sigh, webpagetest.org apparently got some visibility somewhere in the spam and script kiddie/vulnerability scanner communities.

A week or so ago I started getting some "link spam" where a group of people had automated bots to kick off "tests" of their link farms. All of the hyperlinks out from webpagetest are "nofollow" links though so I'm still not sure what they hoped to gain from it. ModSecurity to the rescue and I have all of the current spam attempts locked down and cleared out the previous runs from the history.

Then today it looks like I started getting some activity from some automated compromise scans. The access logs were starting to get all sorts of bizarre requests that weren't legit, some coming from the same source IP, some from bot nets. They weren't successful since webpagegtest is completely custom code but it is on a shared host with a joomla board that I run for my neighborhood which is probably what made them try to compromise it (again, good old ModSecurity and latest patches to keep things safe).

It does make me question the benefit of using public libraries though. All it takes is a vulnerability in a version of the library and everyone who used it can get compromised pretty quickly (and it gets added to the script kiddie scanners prpetty quick). At least with custom code, someone would have to be explicitly targeting your site to compromise it which unless you're running a high-profile site it's a lot less likely (in which case you also have a team responsible for keeping things secure).

I honestly don't see how sites run by amateurs survive (well, they probably don't which is why there are so many compromised hosts out there being used for staging attacks). I was debating adding forum support directly to the webpagetest host but at this point it's probably not worth the effort and risk since those are usually swiss cheese on security.

Anyway, if you get a 403 "Access Denied" message when you're trying to do something, just shoot me a note. It's probably because I tightened down the screws a little too tight and caught you by accident.

Friday, July 11, 2008

Pagetest Optimization Tutorial

Dave Artz put together a great tutorial on using pagetest (the web version) for optimizing sites. It's fairly lengthy but well worth the time to make sure you get the most out of the tool:

http://www.artzstudio.com/2008/07/optimizing-web-performance-with-aol-pagetest/

Wednesday, July 9, 2008

Help make the web a faster place

Back on June 24th Eric Goldsmith presented pagetest at the Velocity conference. It was well received and he learned some critical presentation lessons (the most important being not to put up an url for something you're going to demo on the first slide with an audience full of geeks with laptops). See if you can identify the traffic spike from the conference presentation:


Normally something like 3000 page views wouldn't be a problem but there were a few hundred tests initiated all within the 30 minutes of the presentation window and each test takes close to a minute to complete (though several can be run in parallel). Hopefully people were patient because the queue did clear within an hour or so but there was quite a backlog for a while. I'll be adding a little ajax to the page to at least give people an indication of how many tests are ahead of them in the queue and set some expectations for how long the results will take.

As you can see, the traffic has maintained a boost after the conference and seems to be spreading more by word of mouth and blogs. It also looks like there are a fair number of die-hard users actively tweaking their pages. One interesting thing to browse is the test history link from the main page where you can see what tests were run for various pages. Some look to be pretty well optimized and are testing various options and some are hideous (even some really high-profile sites).

I'm also fairly impressed by the variety of locations that testing is coming from (particularly given that it's all in English):



Finally, on the short-list for upcoming features is better png optimization checking. We're going to be embedding optiPng into pagetest and providing byte-size improvements like we do with jpeg and gzip compression.

Tuesday, February 12, 2008

Online version of pagetest now available

We're in the process of getting the appropriate approvals and permissions to stand it up at AOL but in the meantime I stood up some equipment at the "Meenan data center" (aka my basement) to do hosted testing.

You can get to it at http://www.webpagetest.org/

It has a fair amount of capacity (can run 6 tests concurrently) so as long as nobody posts it on slashdot or digg it should be in good shape and be able to handle any testing needs. For those of you not on the east coast of the US it also gives you an easy way to do remote testing to see how your page loads from here.

The online version is actually a lot more convenient than the installed version because it takes care of clearing the cache for you, running multiple tests and has all of the results already saved out in a way that's easy to pass around (either by url or by shipping the actual images around). It also makes available everything the desktop version has (waterfalls, optimization checklists, optimization reports) and also includes a screen shot of what the browser looked like at the end so you can make sure the page you measured was actually what you meant to.

Here a gratuitous (albeit small) sample of the waterfall:


Wednesday, January 30, 2008

AOL Open sources web performance tool

For my inaugural post I wanted to point out a tool that AOL has not only released to the public but fully open sourced. I'm particularly excited because I wrote it but I think it also has the potential to be really useful for a large portion of the community (and it has some really cool things going on under the covers that I will talk about in future posts).

The tool is Pagetest and it is similar in a lot of ways to portions of Firebug and YSlow but for Internet Explorer (and there are no other tools available for free at this time for IE that do the same things which is why we built it and released it in the first place). This was pretty critical for us as Firefox has a fanatical following with developers (for good reason with all of the developer-related plugins) but it's market share for our actual user base is quite low so we needed to provide the developers with the real end-user view of their pages.

What it does:
  • Provides a visual waterfall display of the various components of the page loading, including DNS lookup, Socket Establishment, Request time and the actual content download (similar to the Firebug "net" display but with more detail
  • Analyzes the requests and provides a visual optimization checklist as well as a report of what should be looked at to improve performance (similar to YSlow but based on the actual requests and with slightly different guidelines)
  • Allows for viewing all of the request and response headers for each request
  • Can be used in a "desktop mode" where you test a page manually or can be automated with all of the reports and details saved out in a format suitable for databasing (more on this to come as the automation tools were also open sourced with pagetest, just not in binary form yet)
  • Has a scripting engine and language for automated testing of complex products (like webmail)
The desktop version is helpful enough as it is but when you hook up the automation capabilities with a web front-end you get an even more powerful tool that lets people on any platform and any browser submit an url to be tested and it will return the results - expect to see a public version of this launched shortly.

On the coding side, some of the things that it includes that I'll try to blog about in the near future:
  • A class (CApiHook) that makes IAT hooking of dll exports a trivial task (used to hook winsock, wininet and a few specific routines)
  • Winsock hooking which gives you all of the capabilities of a Layered Service Provider (LSP) but without the headaches of it being system-wide so you only have to worry about implementing the necessary hooks for the process you are targeting
  • WinInet hooking
  • Crash survival - something we've used in several of our end-user products but this takes it to a whole new level
There is a whole lot more to this release than what you see on the surface so pay attention and I'll try to point most of them out. Hopefully this is also the start of us being more public-facing with some of our internal efforts around web performance.