Archive for October 23rd, 2011

October 23, 2011

DevOps in the Cloud Explained


This post was done by @martinjlogan and based on a presentation given by George Reese @GeorgeReese at Camp DevOps 2011

Time to throw some more buzzwords at you. Nothing makes peoples eyes roll back more quickly than saying DevOps and Cloud in the same sentence. This is going to be all about Cloud and DevOps.

The theory of DevOps is predicated on the idea that all elements of a technology infrastructure can be controlled through code. Without cloud that can’t be entirely true. Someone has to back the servers into the data center and so on. In pure cloud operations we get to this sort of nirvana where everything relating to technology is controllable purely through code.

A key thing with respect to DevOps plus Cloud that follows from the statements made above is that everything becomes repeatable. At some level that is the point of DevOps. You take the repeatable elements of Dev and apply them to Ops. Starting a server becomes a repeatable testable process.

Scalability; this is another thing you get with DevOps + Cloud. DevOpsCloud allows you to increase the server to admin ratio. No longer is provisioning a server kept as a long set of steps stored in a black binder somewhere. It is kept as actual software. This reduces errors tremendously.

DevOps + Cloud is self healing. What I mean by that is that when you get to a scenario where you entire infrastructure is governed by code, that code can become aware of anything that is going wrong within its domain. You can detect VM failures and automagically bring up a replacement VM for example. You know that the replacement is going to work the way you designed it. No more 3:00 AM pager alerts for many of your VM failures because this is all handled automatically. Some organizations even go so far as to have “chaos monkeys” these folks are paid to wreak havoc just to ensure/prove that the system is self healing.

Continuous integration and deployment. This means that your environments are never static. There is no fixed production environment with change management processes that govern how you get code into that environment. The walls between dev and ops fall down. We are running CI and delivering code continuously not just at the application level but at the infrastructure level when you fully go DevOps and Cloud.

CloudDevOpsNoSQLCoolness

DevOps needs these buzzwords. The reason it does is because DevOps only succeeds in so far as you are able to manage things through code. Anywhere you drop back to human behavior you are getting away from the value proposition that is represented by DevOps. What cloud and NoSQL bring into the mix is this ability, or ease, to treat all elements of your infrastructure as a programmable component that can be managed through DevOps processes.

I will start off with NoSQL. I am not saying that you can’t automate RDBMS systems, it is just that it is quite a bit easier to automate NoSQL systems because of their properties. It will make more sense in a bit. Let’s talk CAP theorem. CAP theorem is an over arching theorem about how you can manage consistency, availability and partition tolerance.

Consistency – how consistent is your data for all readers in a system. Can one reader see a different value than another at times – for how long?

Availability – how tolerant is the entire system of a failure of a single node

Partition tolerance – a system that continuous to operate despite message loss between partitions.

Now all that was an oversimplification but lets just go with it for now. Cap theorem says that you can’t have more than 2 of the aforementioned properties at once. Relational DBs are essentially focused on consistency. It is very hard [impossible] to have a relational system that has nodes in Chicago and London and has nodes that are perfectly available and consistent. These distributed setups present some of the largest failure scenarios we see in traditional systems. These failures are the types of things we would want to handle automatically with a self healing system. Due to the properties of RDBMS systems, as we will see, this can be very difficult.

The deployment of a relational DB is fairly easy. This can be automated well enough. What gets hard is if you want to scale your reads based on autoscaling metrics. Lets say I have my reads spread across my slaves and I want the number of slaves to go up and down based on demand. The problem here is that each slave brought up needs to pull a ton of data in order to meet the consistency requirements of the system. This is not very efficient.

NoSQL systems go for something called eventual consistency. Most data that you are dealing with on a day to day basis can be eventually consistent. If, for example, if I update my Facebook status, delete it, and then add another it is ok if some people saw the first update and others only saw the second. Lots of data is this way. If you rely on eventual consistency things become a lot easier.

NoSQL systems by design can deal with node failures. In the SQL side if the master fails your application can’t do any writes until you solve the problem by bringing up a new master or promoting a slave. Many NoSQL systems have a peer to peer based relationship between its nodes. This lack of differentiation makes the system much more resistent to failure and much easier to reason about. Automating the recovery of these NoSQL systems is far easier than the RDBMS as you can imagine. At the end of the day, tools with these properties, being easy to reason about and being simple to automate are exactly the types of tools that we should prefer in our CloudDevOps type environments.

Cloud

Cloud is, for the purposes of this discussion, is sort of the pinnacle of SOA in that it makes everything controllable through an API. If it has no API it is not Cloud. If you buy into this then you agree that everything that is Cloud is ultimately programmable.

Virtualization is the foundation of Cloud but virtualization is not Cloud by itself. It certainly enables many of the things we talk about when we talk Cloud but it is not necessary sufficient to be a cloud. Google app engine is a cloud that does not incorporate virtualization. One of the reasons that virtualization is great is because you can automate the procurement of new boxes.

The Cloud platform has 2 key components to it that turn virtualization into Cloud. One of them is locational transparency. When you go into a vSphere console you are very aware of where the VM sits. With a cloud platform you essentially stop caring about that stuff. You don’t have to care anymore about where things lie which means that topology changes become much easier to handle in the scaling case or failure cases. Programming languages like Erlang have made heavy use of this property for years and have proven that this property is highly effective. Let’s talk now about configuration management.

Configuration management is one of the most fundamental elements allowing DevOps in the cloud. It allows you to have different VMs that have just enough OS that they can be provisioned, automatically through virtualization, and then through configuration management can be assigned to a distinct purpose within the cloud. The CM system handles turning the lightly provisioned VM into the type of server that it is intended to be. Orchestration is also a key part of management.

Orchestration is the understanding of the application level view of the infrastructure. It is knowing which apps need to communicate and interact in what ways. The orchestration system armed with this knowledge can then provision and manage nodes in a way consistent with that knowledge. This part of the system also does policy enforcement to please all the governance folks. Moving to an even higher level of abstraction and opinionatedness we will talk about Platform Clouds (PaaS) in the next section.

Platform Clouds (PaaS)

Platform clouds are a bit different. They are not VM focused but instead focus on providing resources and containers for automated scalable applications. Some examples of the type of resources that are provided by a PaaS system are database platforms including both RDBMS and NoSQL resource types. The platform allows you to spin up instances of these sorts of resources through an API on demand. Amazon SimpleDB is an example of this. Messaging components, things that manage communication between components are yet another example of the type of service that these sorts of systems provide. The management of these platforms and the resources provisioned within them is handled by the cloud orchestration and management systems. The systems are also highly effective at providing containers for resources that are user created.

The real power here is that you can package up your application, say a war file or some ruby on rails app, and then just hand it off to the cloud which has specific containers for encapsulating your code and managing things like security, fault tolerance, and scalability. You don’t have to think about it.

One caveat to be aware of though is that you can run into vendor lock in. Moving from one of these platforms, should you rely heavily on its services and resources, can be very difficult and require a lot of refactoring.

Cloud and DevOps

Cloud with your DevOps offers some fantastic properties. The ability to leverage all the advancements made in software development around repeatability and testability with your infrastructure. The ability to scale up as need be real time (autoscaling) and among other things being able to harness the power of self healing systems. DevOps better with Cloud.

Info

Twitter: @GeorgeReese
Email: george.reese at enstratus dot com