Archive for April, 2013

April 29, 2013

The State of DevOps: Accelerating Adoption

By James Turnbull (@kartar), VP of Technology Operations, Puppet Labs Inc.

A sysadmin’s time is too valuable to waste resolving conflicts between operations and development teams, working through problems that stronger collaboration would solve, or performing routine tasks that can – and should – be automated. Working more collaboratively and freed from repetitive tasks, IT can – and will – play a strategic role in any business.

At Puppet Labs, we believe DevOps is the right approach for solving some of the cultural and operational challenges many IT organizations face. But without empirical data, a lot of the evidence for DevOps success has been anecdotal.

To find out whether DevOps-attuned organizations really do get better results, Puppet Labs partnered with IT Revolution Press IT Revolution Press to survey a broad spectrum of IT operations people, software developers and QA engineers.

The data gathered in the 2013 State of DevOps Report proves that DevOps concepts can make companies of any size more agile and more effective. We also found that the longer a team has embraced DevOps, the better the results. That success – along with growing awareness of DevOps – is driving faster adoption of DevOps concepts.

DevOps is everywhere

Our survey tapped just over 4,000 people living in approximately 90 countries. They work for a wide variety of organizations: startups, small to medium-sized companies, and huge corporations.

Most of our survey respondents – about 80 percent – are hands-on: sysadmins, developers or engineers. Break this down further, and we see more than 70 percent of these hands-on folks are actually in IT Ops, with the other 30 percent in development and engineering.

DevOps orgs ship faster, with fewer failures

DevOps ideas enable IT and engineering to move much faster than teams working in more traditional ways. Survey results showed:

  • More frequent and faster deployments. High-performing organizations deploy code 30 times faster than their peers. Rather than deploying every week or month, these organizations deploy multiple times per day. Change lead time is much shorter, too. Rather than requiring lead time of weeks or months, teams that embrace DevOps can go from change order to deploy in just a few minutes. That means deployments can be completed up to 8,000 times faster.
  • Far fewer outages. Change failure drops by 50 percent, and service is restored 12 times faster.

Organizations that have been working with DevOps the longest report the most frequent deployments, with the highest success rates. To cite just a few high-profile examples, Google, Amazon, Twitter and Etsy are all known for deploying frequently, without disrupting service to their customers.

Version control + automated code deployment = higher productivity, lower costs & quicker wins

Survey respondents who reported the highest levels of performance rely on version control and automation:

  • 89 percent use version control systems for infrastructure management
  • 82 percent automate their code deployments

Version control allows you to quickly pinpoint the cause of failures and resolve issues fast. Automating your code deployment eliminates configuration drift as you change environments. You save time and reduce errors by replacing manual workflows with a consistent and repeatable process. Management can rely on that consistency, and you free your technical teams to work on the innovations that give your company its competitive edge.

What are DevOps skills?

More recruiters are including the term DevOps in job descriptions. We found a 75 percent uptick in the 12-month period from January 2012 to January 2013. Mentions of DevOps as a job skill increased 50 percent during the same period.

In order of importance, here are the skills associated with DevOps:

  • Coding & scripting. Demonstrates the increasing importance of traditional developer skills to IT operations.
  • People skills. Acknowledges the importance of communication and collaboration in DevOps environments.
  • Process re-engineering skills. Reflects the holistic view of IT and development as a single system, rather than as two different functions.

Interestingly, experience with specific tools was the lowest priority when seeking people for DevOps teams. This makes sense to us: It’s easier for people to learn new tools than to acquire the other skills.

It makes sense on a business level, too. After all, the tools a business needs will change as technology, markets and the business itself shift and evolve. What doesn’t change, however, is the need for agility, collaboration and creativity in the face of new business challenges.

About the author:

James Turnbull portraitA former IT executive in the banking industry and author of five technology books, James has been involved in IT Operations for 20 years and is an advocate of open source technology. He joined Puppet Labs in March 2010.

April 2, 2013

Data Driven Observations on AWS Usage from CloudCheckrs User Survey

This is a guest post by Aaron Klein from CloudCheckr

We were heartened when AWS made Trusted Advisor free for the month of March. This was an implicit acknowledgement of what many have long known: AWS is complex and can be challenging for users to provision and control their AWS infrastructure effectively.

We took the AWS announcement as an opportunity to conduct an internal survey of our customers’ usage. We compared the initial assessments of 400 of our users’ accounts against our 125+ best practice checks for proper configurations and policies. Our best practice checks span 3 key categories: Cost, Availability, and Security.  We limited our survey to users with 10 or more running EC2 instances.  In aggregate, the users were running more than 16,000 EC2 instances.

We were surprised to discover that nearly every customer (99%) experienced at least one serious exception.  Beyond this top level takeaway, our primary conclusion was that controlling cost may grab the headlines, but users also need to button up a number of availability and security issues.

When considering availability, there were serious configuration issues that were common across a high percentage of users. Users repeatedly failed to optimally configure Auto Scaling and ELB. The failure to create sufficient EBS snapshots was an almost universal issue.

Although users passed more of our security checks, the exceptions which did arise were serious. Many of the most commons security issues were found in configurations for S3, where nearly 1 in 5 users allowed unfettered access to their buckets through “Upload /Delete” or “Edit Permissions” set to everyone. As we explained in an earlier whitepaper, anyone using a simple bucket finder tool could locate and access these buckets.

Beyond the numbers, we also interviewed customers to gather qualitative feedback from users on some of the more interesting data points.

If the findings of this survey sparks questions about how well your AWS account is configured, CloudCheckr offers a free account that you can set up in minutes.  Simply enter read only credentials from your AWS account and CloudCheckr will assess your configurations and policies in just a few minutes:

Conclusions by Area

Conclusions based upon Cost Exceptions:

As noted, our sample was comprised of 16,047 instances. The sample group spent a total of $2,254,987 per month on EC2 (and its associated costs) for average monthly cost per customer of $7516. Of course, we noted the mismatch between quantity and cost – spot instances represent 8% of the quantity but only 1.4% of the cost. This is due to the significantly less expensive price of spot instances compared to on demand.

When we looked at the Cost Exceptions, we found that 96% of all users experienced at least 1 exception (with many experiencing multiple exceptions). In total, we found that users who adopted our recommended instance sizing and purchasing type were able to save an average of $3974 per month for an aggregate total of $1,192,212 per month.

This suggested that price optimization remains a large hurdle for AWS users who rely on native AWS tools. Users consistently fail to optimize purchasing and also fail to optimize utilization. These combined issues meant that the average customer pays nearly twice as much as necessary for resources to achieve proper performance for their technology.

To further examine this behavior, we interviewed a number of customers.  We interviewed customers who exclusively purchased on-demand and customers who used multiple purchasing types.

Here were their answers (summarized and consolidated):

  • Spot instances worry users – there is a general concern of: “what if the price spikes and my instance is terminated?” This fear exists despite the fact that spikes occur very rarely, warnings are available, and proper configuration can significantly mitigate this “surprise termination” risk.
  • It is difficult and time consuming to map the cost scenarios for purchasing reserved instances. The customers who did make this transition had cobbled together home grown spreadsheets as a way of supporting this business decision.  The ones who didn’t make this effort made a gut estimate that it wasn’t worth the time.  AWS was cost effective enough and the time and effort for modeling the transition was an opportunity cost taken away from building and managing their technology.
  • The intricacies of matching the configurations between on demand instances and reserved instances while taking into consideration auto scaling and other necessary configurations were daunting. Many felt it was not worth the effort.
  • Amazon’s own process for regularly lowering prices is a deterrent to purchasing RIs. This is especially true for RIs with a 3 year commitment.  In fact, within the customers who did purchase RI, none expressed a desire to purchase RIs with a 3 year commitment. All supported their decision by referencing the regular AWS price drops combined with the fact that they could not accurately predict their business requirements 3 years out.

Conclusions based upon Availability Exceptions:

We compared our users against our Availability best practices and found that nearly 98% suffered from at least 1 exception. We hypothesized that this was due to the overall complexity of AWS and interviewed some of our users for confirmation. Here is what we found from those interviews:

  • Users were generally surprised with the exceptions. They believed that they “had done everything right” but then realized that they underestimated the complexity of AWS.
  • Users were often unsure of exactly why something needed to be remedied. The underlying architecture of AWS continues to evolve and users have a difficult time keeping up to speed with new services and enhancements.
  • AWS dynamism played a large role in the number of exceptions. Users commented that they often fixed exceptions and, after a week of usage, found new exceptions had arisen.
  • Users remained very happy with the overall level of service from AWS. Despite the exceptions which could diminish overall availability, the users still found that AWS offered tremendous functionality advantages.

Conclusion bases upon Security Exceptions:

Finally, we looked at security. Here we found that 44% of our users had at least one serious exception present during the initial scan. The most serious and common exceptions occurred within S3 usage and bucket permissioning. Given the differences in cloud v. data center architecture, this was not entirely surprising. We interviewed our users about this area and here is what we found:

  • The AWS management console offered little functionality for helping with S3 security. It does not provide a use friendly means of monitoring and controlling S3 inventory and usage. In fact, we found that most of our users were surprised when the inventory was reported. They often had 300-500% more buckets, objects and storage than they expected.
  • Price = Importance, S3 is often an afterthought for users. Because it is so inexpensive users do not audit it as closely as EC2 and other more expensive services and rarely create and implement formal policies for S3 usage.  The time and effort required to log into each region one by one to collect S3 information and download data through the Management console was not worth the effort relative to spend.
  • Given the low cost and lack of formal policies, team members throw up high volumes of objects and buckets knowing that they can store huge amounts of data at a minimal cost.  Since users did not audit what they had stored, they could not determine the level of security.

Cloud Computing Forrest

AaronKleinAuthor Info: Aaron is the Co-Founder/COO of CloudCheckr Inc. (CCI). With over 20 years of managerial experience and vision, he directs the company’s operations.

Aaron has held key leadership roles at diverse organizations ranging from small entrepreneurial start-ups to multi-billion dollar enterprises. Aaron graduated from Brandeis University and holds a J.D. from SUNY Buffalo.

Underlying Data Summary

Cost:                                                                                                       Any exception 96%

The total of 16,047 instances was broken in the following categories:

  • On Demand:       78%    (12,517 instances)
  • Reserved:             14%    (2,247 instances)
  • Spot:                        8%      (1,284 instances)

The instance purchasing was broken down as follows:

  • On Demand:        89.7%  ($2,023,623)
  • Reserved:             8.9%     ($199,803)
  • Spot:                        1.4%     ($31,561)

Common Cost Exceptions we found:

  • Idle EC2 Instances                                                                                                      36%
  • Underutilized EC2 Instances                                                                               84%
  • EC2 Reserved Instance Possible Matching Mistake                                             17%
  • Unused Elastic IP                                                                                                        59%


Availability:                                                                                              Any exception 98%

Here, broken out by service, are some highlights of common and serious exceptions that we found:

Service Type:                                                                                      Customers with Exceptions

EC2:                                                                                                           Any exception   95%


  • EBS Volumes That Need Snapshots                                                                91%
  • Over Utilized EC2 Instances                                                                                                   22%

Auto Scaling:                                                                                              Any exception   66%


  • Auto Scaling Groups Not Being Utilized  For All EC2 Instances                       57%
  • All Auto Scaling Groups Not Utilizing Multiple Availability Zones                34%
  • Auto Scaling Launch Configuration Referencing Invalid Security Group                   22%
  • Auto Scaling Launch Configuration Referencing Invalid AMI                           18%
  • Auto Scaling Launch Configuration Referencing Invalid Key Pair                 16%

ELB:                                                                                                        Any exception   42%


  • Elastic Load Balancers Not Utilizing Multiple Availability Zones                 37%
  • Elastic Load Balancers With Fewer Than Two Healthy Instances               21%

Security:                                                                                                    Any exception   46%


These were the most common exceptions that we found:

  • EC2 Security Groups Allowing Access To Broad IP Ranges                                 36%
  • S3 Bucket(s) With ‘Upload/Delete’ Permission Set To Everyone                   16%
  • S3 Bucket(s) With ‘View Permissions’ Permission Set To Everyone            24%
  • S3 Bucket(s) With ‘Edit Permissions’ Permission Set To Everyone               14%