When Staging Really is for Staging– Deployment Patterns for Windows Azure
So a post that came through on RSS this morning from Ryan Dunn got me motivated to jot down some notes on a topic I’ve bee thinking about for a while. How best to deploy into Windows Azure. I’ll caveat this post with the fact that at the moment the Windows Azure deployment story is fairly simplistic- I don’t think we’ll be deploying via the web portal come RTM.
So in Windows Azure you basically have two types of account, Storage and Compute. Within a compute account you effectively have two sets of instances that are grouped behind two URLs. These are referred to as ‘Staging’ and ‘Production’. So you might have two URLs that look something like this:
For all intents and purposes there is no difference between the instances that sit in behind these URLs. Basically the Azure Load Balancers are just pointing requests at the two different groups of servers. When you then push the ‘big blue button’ to do a production release you are basically just re-configuring the rules on the load balancer to swap the URLs around. This leads me to my point from the post title. In Windows Azure the idea of Staging really is much more akin to the military idea of a staging area. It’s the place where you marshal the troops ready to charge over the trenches.
The ‘Staging’ deployment in Windows Azure basically gives us an area to marshall our instances before we make them live on the production URL. It means that we can achieve a Zero downtime deployment model because the machines are warm when we start routing traffic to them. It also means that you might not be able to (or probably shouldn’t) treat the Staging Deployment like you might traditionally treat a staging server in a more traditional on premise deployment. In an on premise staging server you’ll usually take your application, deploy it into staging, test it (under limited load) then redeploy it into production- this is a very different model.
Looking at the image above you’ll see I have 1 instance in my Staging Deployment and 2 Instances in my production deployment. If I do a swap of these then we’ll end up with 1 instance in production and 2 in staging- basically halving the capacity of our production system. This would be bad.
You should use your Staging deployment as a true staging area. You should configure it to run the same number of instances. You should give it the same configuration settings (where possible given code/data change vectors). In this way when you flick the switch and route traffic at those warm servers you’ll be ready to rock at full capacity.
What does this mean for your Windows Azure development lifecycle? Here’s how I’d configure my accounts for a real world scenario aiming to achieve zero downtime upgrades. I’m going to *gloss over* the considerations of upgrading your data layer and leave that for another blog post.
I’d configure myself with 3 storage accounts and 2 compute accounts for my application.
foobarDevStorage, foobarStageStorage, foobarProdStorage
I’d dev my project on my local box using the local computer fabric and local storage. In some situations I’d use the cloud for my Dev storage instead.
I’d treat my Stage accounts like a traditional Staging server. In other words I’d configure it with a couple of instances and deploy my package up there for pre-release testing. In reality I’d probably never use the Stage.Production Deployment on my staging server instead contenting myself with the http://<guid>.cloudapp.net URL. When I’m ready to do a production release I’d take the same package I used in Stage and deploy it into the Prod account. I’d use a Production config file and my Prod.Staging Deployment in the foobarProd account would be configured to run the full number of instances- i.e. if my production site runs on 30 instances my Prod.Staging deployment would need to be running 30 instances. I’d then to a quick^^ *smoke test* on the Prod.Staging deployment before finally swapping the Prod.Production and Prod.Staging deployments and basically routing traffic at my new deployment. I’d then suspend my Prod.Staging deployment because I’ll be paying to have those instances running.
That’s my current thinking anyway. All subject to change as Windows Azure changes over the coming month.
Now that’s the easy bit! Hot swapping the compute instances like this is easy because we are (or at least should be) building them using a stateless server pattern.
The harder part is achieving a hot swap upgrade when you need to change your data tier. Because the data tier is stateful we need to consider consistency of the data. This is a problem in on-premise models too and one that does not have an easy solution. Maybe the Azure team will come up with some ‘magic’ that helps here??? I’ve got some thoughts on how it might be done but none of them are particularly elegant or developed yet. keep your eye out for a post on it sometime in the future.
If this stuff appears useful to people (including the footnote below) I might bang up a sample and a web cast. Let me know in the comments.
P.S. Ryan: Looking forward to seeing that tool!
Windows Azure|Tuesday, April 28, 2009 11:37:03 PM UTC||
^^ In reality I’d probably want each instance to have serviced at least one request in order to have JIT’ed everything and spun up. This means, for a 30 instance deployment, that I’d really have to slam the server with some load and probably have some way (setting a flag in the Application.OnStart event?) of ensuring that I had hit 30 unique servers. Easiest way to generate the load would probably be with a Windows Azure Worker role