Thursday, October 28, 2010

Performance Measurement consistency and Video Capture impact

One thing I've always been concerned about is people taking a single measurement point and using that as representative of the performance of their site.  Even for a well-performing platform (where the back-end is consistent) there are a lot of variations and sometimes significant outliers (I usually recommend taking at least 3 measurements and throwing out any that look obviously wrong).

Recently I've been looking at options for moving the Dulles testers for WebPagetest out of my basement and into an actual data center (picture of the "Meenan Data Center" will be posted at some point).  There have also been some projects recently where we needed to run a whole lot of tests and it would take several days on the current infrastructure (even in Dulles where I have multiple machines backing the IE7 and IE8 configurations).  I've been looking at co-lo pricing but to do anything of reasonable scale gets pretty expensive, even with virtualization when you factor in the Microsoft tax.  It turns out that Amazon's us-east EC2 region is right next to me in Northern Virginia so it is looking like a very attractive option.

I have also been asked quite a few times about the impact of capturing video on the actual test results (and I've generally steered people away from comparing video-capture results against non-video results).  I jumped through a lot of hoops in Pagetest to minimize the impact and I measured it as "effectively 0" on my dev systems but never did any large-scale testing against the public test systems to see how they perform.

Joshua Bixby also recently blogged about some testing that Tony Perkins had been doing with EC2 and how the Micro instances were not suitable for performance testing (but small were looking good).

So, all this means it's time for some testing:

First, some basics on the configurations at play:

IE7 - The Dulles IE7 tests are run on physical machines (3Ghz P4's running XP with 512MB of RAM).  There are 4 physical machines running the tests.
IE8 - The Dulles IE8 tests are run in a VM running under VMWare ESXi 4.0 (XP with 512MB of Ram).  There are 8 VM's available for testing.
EC2 - The EC2 instances (both IE7 and IE8) are running in the us-east region on "small" instances using instance storage (Windows Server 2003 with 1.7GB of Ram).  I had 2 instances of each available for testing so any consistency would not be because of running on the same machine

I ran 100 tests each of, and both with and without video capture.  These are all sites with sufficient infrastructure to provide consistent serving performance and all leverage CDN's.

I'm pretty visual so my favorite way to consume data for this kind of test is to look at a cumulative percentile graph with the load time across the X axis and the percentile along the Y axis.  The more vertical a line is, the more consistent the results and if lines are basically right on top of each other they are performing identically.

And the results (pretty picture time):

So, what does it mean?

  • It looks like the current Dulles IE7 machines are seeing an impact to the measurements when capturing video (at least in some cases).  
  • Both virtualized environments do NOT appear to be impacted by capturing video
  • EC2 results are generally slower than the current Dulles results (network peering is my best guess because they are using identical traffic shaping)
  • The EC2 results are quite consistent and look promising


  1. Is webpagetest impacted by the issues EC2 has with QueryPerformanceCounter() ?

  2. I haven't seen it in any of the testing I have been doing (1200 tests today alone with several hundred calls to QueryPerformanceCounter() each). It could be because I'm using small instances (1 core) and 2003 server but thanks for the link. Worst-case it looks like there is a boot.ini work-around that I can use if it shows up since I'm on 2003.

  3. Played with EC2 instances in Singapore, didnt collect any numbers as such, but they felt similar in performance and consistancy to a normal VPS or a 1 GB winxp virtualbox on my i7 desktop....

    The upside with EC2 is that the spot instances are usually very cheap... and while running bulk tests, you can have 100s of instances at your command for dirt cheap. (at the moment spot price for small windows box is $0.051 per hour )

    Any plans on releasing an official AMI for the test heads?

  4. How about testing mobile performance?

  5. @sajal - yep, that's exactly what I plan to do. I have it all configurable from the user data for exactly that reason. It is documented here:

    I just built the images over the last two days so they may not be final but they are looking good so far.

  6. @Peter - I'm looking at what the next generation of the infrastructure should look like and mobile is the #1 requirement (Android specifically).


All comments are moderated and may take a while to appear.

Note: Only a member of this blog may post a comment.