As you no doubt heard this week, Rackspace has announced the intention to focus on managed cloud. Inevitably this brought observations from many about RAX, and others, ability to compete effectively against the web scale public cloud giants: Amazon, Microsoft, and Google. One of the commenters was Mike Kavis (twitter link), a long time cloud pundit and someone who’s opinion I respect. Mike wrote up a fairly interesting article that he posted on Forbes that I encourage you to read in full. Unfortunately, Mike falls into one of the older cloud tropes that I thought was well and truly dead. Today I seek to clarify and hopefully amplify much of what he said.
First, we need to address the so-called “economies of scale” that large public cloud providers enjoy. Simply put, economies of scale are structural cost advantages that come from sufficient size, greater speed, enhanced productivity, or scale of operation. Unfortunately, many folks, including Mike, fall into the trap of assuming that “economies of scale” == “buying power”. Buying power can be an element of achieving scale, but it is seldom a structural or sustainable advantage, certainly not against other large businesses who can command similar quantities of capital.
No, the real economies of scale that are relevant here are the tremendous investments in R&D that have led to technological innovations that directly impact the cost structures of Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Here are some examples of what I mean:
This is just the first three items that are A) public and B) come to my mind without a lot of additional research. There are hundreds of other innovations. Most of these innovations share a central theme: reduction of cost through greater efficiency or being able to deploy lower cost hardware. For example, Google’s G-Scale Network uses inexpensive Taiwanese ODM switches.
As I have mentioned previously, the rate of innovation and development at these public clouds is where the true economies of scale reside.
Much of this is alluded to when Mike covers the level of investment in infrastructure and R&D from Amazon, Microsoft, and Google. Unfortunately Mike mixes infrastructure investment (CapEx) with R&D investment (OpEx). We actually don’t know what the levels of R&D investment are at the big three, although we know that they literally have thousands of developers at each working diligently on new capabilities.
And this is where we go off the rails because Mike throws IBM’s hat in the ring as a real contender because they are planning to invest $1.2B in new datacenters. This is actually uninteresting and mostly irrelevant when it comes to measuring the probability of success in the public cloud game. New datacenters and hardware won’t provide a true structural cost advantage. That can only come through investment in R&D and a proven track record of innovation in public cloud, neither of which IBM is clearly succeeding in. Perhaps they will and perhaps they are a true public cloud contender, but it’s hard for me to see that given that so much of this is about a cultural and organizational structure that can encourage innovation.
What does it take to change? As many of you know, I poo-pooed Microsoft’s chances for quite a while because I had no belief in their eventual success at delivering online services. Mostly because I felt the organization as a whole struggled with the operating system boat anchor and couldn’t let go. Of course this was before Satya Nadella took over the helm and declared a focus on cloud and mobile. In essence he empowered and enabled the Online/Live teams to become the new Microsoft. The Live teams have learned “web scale” the hard way, over many many years and through the spilling of much red ink. See this article from 2011 on MSFT’s Online services operating income. Microsoft has spent 10 years and many billions of dollars to become a credible player against the likes of Amazon and Google.
In that light, how can anyone else even pretend to the throne without similar levels of investment? Buying a hosting company is not going to get you there. This isn’t a game of buying power or outsourcing. It’s an innovation game and that’s it. The number of players who can pull this off are vanishingly small.
You want “economies of scale?” You’re going to pay for it and at this point it’s probably too little too late.Posted in Cloud Computing | Leave a comment
The Cloudscaling team has, once again, submitted an outstanding array of talks and we would all appreciate if you took the time to vote for our presentation submissions. We’ve taken the time to summarize our presentations for you below, as well as provide you with an easy link to cast your vote:
There’s no doubt that Cloudscaling would not be as great as it is without our customers and other Stacker friends. That’s why we humbly ask you to please take the time to vote for these submissions which include user stories from companies who are using OpenStack to achieve agility.
Customer Use Cases:
How Lithium used Cloudscaling OCS to bring IT into the Modern Cloud era – https://www.openstack.org/vote-paris/Presentation/how-lithium-used-cloudscaling-ocs-to-bring-it-into-the-modern-cloud-era
This session will include a case study by Randy Bias, CEO Cloudscaling and Joe Sandoval, Lithium about the key benefits experienced by Lithium in their journey: increased agility, “open” architecture, application modernization, improved DevOps efficiency and a foundation for the future, just to name a few.
No Wait IT Keeps Developers Productive at Ubisoft – https://www.openstack.org/vote-paris/Presentation/no-wait-it-keeps-developers-productive-at-ubisoft
This session will include a case study by Randy Bias, CEO Cloudscaling and Marc Heckmann, Enterprise Cloud Architect, Ubisoft about the key benefits experienced by Ubisoft in their journey: agility with “control”, satisfying the LOB’s need for speed, increased IT efficiency, application modernization, and improved DevOps efficiency.
Service Provider Achieves Ultra-Agile Infrastructure using Cloudscaling OCS – https://www.openstack.org/vote-paris/Presentation/service-provider-achieves-ultra-agile-infrastructure-using-cloudscaling-ocs
This session will include a case study by Randy Bias, CEO Cloudscaling and Matt Kinney, Idig.net about the key benefits experienced by Canadian Web hosting in their journey: increased agility, services that are cost-competitive with major public clouds like AWS and a slew of new, dynamic cloud applications that customers love.
Panels on OpenStack, Hybrid Cloud, and Other Business Cases:
The OpenStack Thunderdome – https://www.openstack.org/vote-paris/Presentation/the-openstack-thunderdome
“…five highly opinionated Stackers will lock horns over the best way to bring about global dominance of OpenStack as the default cloud-building platform. “
Top Hybrid Cloud Myths Debunked – https://www.openstack.org/vote-paris/Presentation/top-hybrid-cloud-myths-debunked
Four experts from diverse industries including public and private cloud, systems integration and cloud management will square off and separate reality from assumption around hybrid cloud myths and top trends.
Hybrid Cloud War Stories: Expecting the Unexpected – https://www.openstack.org/vote-paris/Presentation/hybrid-cloud-war-stories-expecting-the-unexpected
Whatever you didn’t expect to go wrong, does. What you hoped for does not materialize. Building a private or public cloud is hard enough, but putting them together is not for the faint of heart. Four experts who have extensive experience in using, delivering, and architecting hybrid clouds will chime in on what to look for when going for the gold.
Compliance Slows Us Down While Cloud Speeds Us Up; Or Does It? – https://www.openstack.org/vote-paris/Presentation/compliance-slows-us-down-while-cloud-speeds-us-up-or-does-it
Compliance and governance give the appearance of slowing down IT, while Cloud gives us the hope of moving faster and faster. Can managing down risk co-habitate with greater agility and flexibility? We’ll ask that question and more.
Adopting Cloud? Unlearn everything you know about traditional enterprise architecture first. – https://www.openstack.org/vote-paris/Presentation/adopting-cloud-unlearn-everything-you-know-about-traditional-enterprise-architecture-first
Cloud is about a lot more than VMs-on-demand. Traditional enterprise IT approaches are being disrupted by “web scale” techniques. As the cloud changes everything, we need to understand how web scale ultimately dovetails with traditional enterprise requirements. How can we interpret the lessons learned from the big guys for your every day enterprise?
Next-Gen Organizational Design – Growth hacking with “BusDevOps” - https://www.openstack.org/vote-paris/Presentation/next-gen-org-design-growth-hacking-with-busdevops
The silos between dev and ops are coming down, but is it possible to extend that thinking to the rest of the business? Can we achieve the impossible? Bringing together the best of dev, ops, business development, sales, and product functions, we’ll discuss how this might play out and why it could be important to the next generation of businesses.
Technology Oriented Presentations
Scale-Out Ceph: Rethinking How Distributed Storage is Deployed (Speakers: Randy Bias, Tushar Kalra) – https://www.openstack.org/vote-paris/Presentation/scale-out-ceph-rethinking-how-distributed-storage-is-deployed
At Cloudscaling, we believe that unification isn’t the answer. Good solid tiered storage architecture is the answer. In this session, see how Ceph can shine as part of a considered storage strategy rather than as ‘the only answer’.
Cloud Operations Dashboard Demo: Cloudscaling OCS User Interface – https://www.openstack.org/vote-paris/Presentation/cloud-operations-dashboard-demo-cloudscaling-ocs-user-interface
Step inside and we’ll give you a tour of Cloudscaling’s Open Cloud System cloud operator GUI, API, and CLI tools. By the operator, for the operator. Power up now!
Tales From the Field: A Day in the Life of Cloud Operations – https://www.openstack.org/vote-paris/Presentation/tales-from-the-field-a-day-in-the-life-of-cloud-operations
Cloudscaling has been supporting 24×7 production clouds since before OpenStack existed. In this session, we will discuss the kinds of problems folks run into in typical OpenStack deployments and pull out a couple of interesting incidents to perform a deep dive on.
Tempest Testing for Hybrid and Public Cloud Interoperability – https://www.openstack.org/vote-paris/Presentation/tempest-testing-for-hybrid-and-public-cloud-interoperability
In this presentation we will take a closer look at DefCore, it’s origins and intentions, how the initiative can be extended, how RefStack can be used as the basis not only for OpenStack interoperability testing but also public cloud interop testing, and finally give a demonstration of wrapping this all up into an actionable package.
OpenStack Design Guide Panel – https://www.openstack.org/vote-paris/Presentation/panel-with-the-authors-of-the-openstack-design-guide
Bring your real-world questions and be prepared to talk OpenStack architecture with a panel of experts from across multiple disciplines and companies. We’ll be drawing on real architecture and design problems taken from real-world experiences working with, and developing solutions, built on OpenStack.
Virtual Private Cloud (VPC) Powered by Cloudscaling OCS & OpenContrail – https://www.openstack.org/vote-paris/Presentation/virtual-private-cloud-vpc-powered-by-cloudscaling-ocs-and-opencontrail
We’ll explore how the Cloudscaling VPC cloud solution leverages SDN technology which has been well-tested in the telecommunications and service provider industries to build overlay networks that scale – both within and across data centers. We’ll also take a closer look at the VPC API updates to the OpenStack EC2 API and how the development work done there is providing real fidelity with leading public cloud providers enabling true hybrid cloud solutions.
OpenStack Reference Architecture: Scaling to Infinity and Beyond – https://www.openstack.org/vote-paris/Presentation/openstack-reference-architecture-scaling-to-infinity-and-beyond
We’ll explore some of the basic principles of creating a reference architecture and discuss real-world examples that demonstrate why implementing a reference architecture allows scaling from one to thousands of racks with relative ease. We will also touch upon why your reference architecture choices can directly affect interoperability between OpenStack clouds and between OpenStack and major public clouds.
We hope that you can take the time to vote for all of our presentations and we certainly hope to see you in Paris!
Also make sure you check out the list of fantastic presentations at the Mirantis blog.Posted in OpenStack | Leave a comment
Today I am extremely pleased to welcome Tarkan Maner, CEO of Nexenta and previously CEO of Wyse technologies, to Cloudscaling’s board of directors. The full press release can be found here.
I met Tarkan during our search for a new CEO and found him to be a passionate entrepreneur with a strong desire to help Cloudscaling succeed. Tarkan and I share similar characteristics. We are both super high energy personalities. We are both intensely keen on cloud. However, we also come from two different worlds: sales/ marketing vs. technology. And that contrast makes all all the difference.
One thing I pride myself on is finding new members of the team who fill in existing capability gaps. Or those who bring completely new capabilities and skills to the table. Here, I think Tarkan’s track record speaks for itself. He has built and led high growth businesses, like Wyse, into the cloud future. I am very much looking forward to learning from Tarkan’s experience and wisdom and his desire to join our board, with so many others courting him is a massive vote of confidence in myself, my co-founder Adam Waters, our business, and the rest of the Cloudscaling team.
Great to have you onboard, Tarkan!
—RandyPosted in Company | Leave a comment
In part 1 and part 2 of this series I introduced the core ideas around defining the requirements and then discussed the first four. Today we’ll discuss the final two requirements and tie it all together.
Onwards and upwards!
Enterprise-grade has to mean something. In the past, enterprise-grade related to a certain quality of a system that made it highly reliable, scalable, and performant. More and more, enterprise-grade is beginning to mean “cloud-grade” or “web scale.” What I mean by that is that as the move to next generation applications happens and enterprises adopt a new IT model, we will see major changes in the requirements for delivering a high quality system.
The example I love to use is Hadoop. Hadoop comes with a reference architecture that says: use commodity servers, commodity disk drives, and NO RAID. When is the last time you saw an enterprise infrastructure solution with no data protection at the hardware layer? Of course, it doesn’t make sense to run Hadoop on high end blades attached to a fiber channel SAN, although I have seen it. Even Microsoft Exchange has begun recommending a move towards JBODs from RAID and depending on the application software layer to route around failure.
Let’s talk about these three requirements for enterprise-grade OpenStack.
Scalability is the property of a system to continue to work as it increases in size and workload demands. Performance is the measurement of the throughput of an individual resource of the system rather than the system itself. Perhaps Werner Vogels, CTO of Amazon, said it best:
A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.
In many ways, OpenStack itself is a highly scalable system. It was designed around a loosely-coupled, message passing architecture — something tried and true for mid to large scale, while also able to scale down to much smaller size deployments. The problem, unfortunately, lies in what design decisions you made when configuring and deploying OpenStack. Some of its default configurations and many of the vendor plugins and solutions have not been designed with scale in mind. When you read the previous installment, you understood that having a reference architecture is critical to delivering hybrid cloud. Make certain you pick an enterprise-grade product with a reference architecture that cares about scale and performance while using well-vetted components and configuration choices.
A complete examination of the scale and performance issues that might arise with OpenStack is beyond the scope of this series; however, let’s tackle the number one issue that most people run into: network performance and scalability.
OpenStack Compute (Nova) has three built-in default networking models: flat, single_host, and multi_host. All three of these networking models are completely unacceptable for most enterprises. In addition to these default networking models, you have the option of deploying OpenStack Networking (Neutron), which has a highly pluggable architecture that supports a number of different vendor plugins both to manage physical devices and also network virtualization systems (so called Software-defined Networking or SDN).
A very brief explanation of the default networking models shortcomings is in order. I will keep it very simple, but am happy to follow up later with more details. The flat and multi_host networking model requires a single shared VLAN for all elastic (floating) IP addresses. This requires running spanning tree protocol (STP) across your switch fabric, a notoriously dangerous approach if you want high network uptime. I’ve asked the question at multiple conferences to a full room: “How many of you think STP is evil?” and have had nearly everyone in the room raise their hand.
Perhaps more importantly, both flat and multi_host networking models require you to route your public IP addresses from your network edge down to the hypervisor nodes. This is really not an acceptable approach in any modern enterprise. Here’s a diagram showing multi_host mode:
It’s probably also worth noting that if you want multi_host mode, you need to be able to load code on your hypervisor. That means if you like ESX or Hyper-V you are probably out of luck.
By contrast, single_host mode suffers from the singular sin of trying to make a single x86 server the core networking hub through which all traffic between VLANs and the Internet runs. Basically, take your high performance switch fabric and throw it in the trash because your maximum bandwidth is whatever a Linux server can push. Again, this is not an acceptable or even credible approach to networking. To be fair though, OpenStack competitors also took a similar approach to this. Here’s a diagram on single_host mode:
All of these approaches have fundamental scalability and performance issues. Which brings us to OpenStack Neutron.
As of September 2013, it seems like Neutron still had significant issues as seen in this critical posting from Scott Devoid of Argonne National Labs (ANL) to the OpenStack operators mailing list. As of this writing, OpenStack Neutron supports single_host and flat modes, but not multi_host. Apparently, we may see a replacement for multi_host in the Juno timeframe, although this capability has been missing for a while now.
That being said, Neutron has made a lot of progress and to be honest, many of the issues folks have reported are more likely to stem from poorly written and adapted plugins. What this means is that in order to deliver success with OpenStack Neutron you need a version of Neutron plus accompanying plugins that have been designed for scale and performance. Plus, your cloud operating system vendor should have some proven deployments at scale and have really beaten the crap out of the networking using exhaustive testing frameworks.
I could say much more about the performance and scalability of an enterprise-grade, OpenStack-powered product; however, it should give you a starting point in pinning down your vendors to make sure they have addressed these and related issues.
Most important: regardless of your OpenStack vendor they must be able to provide a detailed, multi-rack reference network architecture.
Without a reference network architecture, your ability to scale past a single rack is purely based on hand-waving and assurances from your vendor that may or may not have any validity.
Infrastructure can’t ever be truly elastic, but its properties can enable elastic applications running on top of it. An elastic cloud is one that has been designed such that individual cost of resources such as VMs, block storage, and objects is as inexpensive as possible. This is directly related to Jevon’s Paradox, which states that as a technology progresses, the increase in efficiency leads towards an increase in the rate of consumption of that technology:
Simply put, as the relative cost of components in the system reduces, applications can not only consume more, which enables routing around failures, but also consume more for the purposes of scaling application needs up and down based on demand. In essence, you can make the pool larger and buy more capacity if the individual components and resources are as cheap as possible.
Major elastic public clouds like Google, Amazon, and Microsoft are providing these kinds of properties, and it’s what you need to provide inside your four walls to enable hybridization.
Enterprise-grade OpenStack will help lead you into the future by providing scalability and performance while supporting elastic applications. Beware the OpenStack-powered cloud operating system that wants you to use a fiber channel SAN and blade servers. Those days have passed, as we can see with Hadoop.
Chances are you are a global organization and are planning to deliver 24x7x365 next generation, cloud native applications on top of your private, public, and hybrid clouds. You want partners who can support you globally, who have international experience, and most of all who are comfortable with supporting 24x7x365 environments.
IT administrators are in the process of transitioning into cloud administrators. This evolution will be a deep and lasting change inside the enterprise. Entirely new sets of skills need to be developed and other skills refreshed and realigned to the new cloud era. When evaluating your enterprise-grade OpenStack partner, you should be looking for one with significant capabilities in training, both on generic OpenStack and on their specific cloud operating system product. Most importantly, when evaluating a partner who can help you upgrade your team’s cloud skills, make certain they aren’t just going to show you how to develop on OpenStack or install OpenStack.
What you really need is operator-centric training that focuses on:
No matter how good your IT team is, you will need a trusted support team to back you up — a team that can support your entire system end-to-end. Make certain you ask your Enterprise-grade OpenStack-powered cloud operating system vendor if their support team has supported high transaction 24×7 environments before. Be certain that they have so-called “full stack” support capabilities. Can they troubleshoot the Linux kernel, your hypervisor of choice, networking architecture and performance issues, and do they understand storage at a deep level? Clouds are integrated systems and compute, storage, and networking all touch each other in fundamental ways. Your vendor needs to know a lot more than how to configure or develop for OpenStack. They need to be cloud experts at all levels of the stack. Demand it.
Delivering a cloud internationally is no small feat, whether large or small. It requires more than just reach. It requires cultural sensitivity and the ability to understand the unique requirements that arise in particular geographies. For example, did you know that while most data centers are more concerned over power than space, in Japan it’s still usually just the opposite. Space winds up being one of the single largest premiums. This space requirement is unique to their particular environment.
Your cloud operating system vendor should have a track record of successful international delivery and a partner network that can assist in a particular location.
OpenStack is an amazing, scalable foundation for building a next generation elastic cloud, but it’s not perfect. None of the open source solutions it competes with are perfect either. Instead, each of these tools is really a cloud operating system “kernel” that can be used to deliver a more complete, vetted, Enterprise-grade cloud operating system (cloudOS). You will need an experienced enterprise vendor to deliver your cloudOS of choice and whether it’s OpenStack or another similar project I hope you will keep these requirements in mind.
I hope you enjoyed this whirlwind tour through the 6 Requirements of Enterprise-grade OpenStack. As a reminder, we covered these SIX requirements:
As you are out there evaluating the right vendor to help with your OpenStack adoption process and the move towards hybrid cloud, make certain you find out how much, if any of these requirements they can meet.
For some related white papers, check out:
 It’s also fair to say that some times people are using the messaging systems in an inappropriate manner. Some times, plain old UDP is still best for fire and forget high throughput systems, like logs.
 Before you cry foul, others such as Eucalyptus also went down this unfortunate path. I believe Eucalyptus 4.0 finally fixes this. It’s a common mistake for people without networking experience to go down this path.
 So far the only one we have tested extensively is OpenContrail, which shows great promise, but we have to get running in some larger deployments before we declare victory.
 We know. Building the KT uCloud in 2010 and 2011 was a huge task for an early stage startup.Posted in OpenStack | Leave a comment
In part 1 of this series earlier this week, I introduced The 6 Requirements of Enterprise-grade OpenStack. Today, I’m going to dig into the next two major requirements: Open Architectures and Hybrid Cloud Interoperability.
Let’s get started.
We already covered building a robust control plane and cloud management system. One of the attractions of OpenStack is removing vendor lock-in by moving to an open source platform. This is a key value proposition and deserves a complete dialog about what is realistic and what is not in an Enterprise-grade version of OpenStack.
Are you being promised that OpenStack provides “no lock-in?” No vendor lock-in is a platonic ideal – something that can be imagined as a perfect form, but never achieved. Any system always has some form of lock-in. For example, many of you probably use RedHat Enterprise Linux (RHEL), a completely 100% open source Linux operating system, as your default Linux within your business. Yet, RHEL is a form of lock-in. RHEL is a specific version of Linux designed for a specific goal. You are locked into their particular reference architecture, packaging systems, installers/bootloaders, etc., even though it is open source.
In fact, with many customers I have seen less of a fear about lock-in and more of a concern about “more lock-in.” For example, one customer, who will remain anonymous, was concerned about adopting our block storage component, even though it was 100% open source due to lock-in concerns. When probed, it became clear that what the customer wanted was to use their existing storage vendors (NetApp and Hitachi Data Systems) and did not want to have to train up their IT teams on a completely new storage offering. Here the lock-in concerns were predominantly about absorbing more lock-in rather than removing it entirely.
What is most important is assessing the risks your business can take. Moving to OpenStack, as with Linux before it, means that you are mitigating certain risks in terms of training your IT teams on the new stack and hedging your bets by being able to get multiple vendors in-house to support your open source software.
In other words, OpenStack can certainly reduce lock-in, but it won’t remove it. So, demand an open architecture, but expect an enterprise product.
I wish it didn’t, but lock-in does happen, as you can see from above. That means that rather than planning for no lock-in, start planning for what lock-in you are comfortable with. An Enterprise-grade version of OpenStack will provide a variety of options via an open architecture so you can hedge your bets. However, a true cloud operating system and enterprise product cannot ever provide an infinite variety of options. Why not? Because then the support model is not sustainable and that vendor goes out of business. Not even the largest vendors can provide all options to all people.
If you want to build your own customized cloud operating system built around OpenStack, go ahead, but that isn’t a product. That’s a customized professional services path. Like those who rolled their own Linux distributions for a while, it leads to a path of chaos and kingdom-building that is bad for your business. Doing it yourself is also resource intensive. You’ll need 20-30 Python developers with a deep knowledge of infrastructure (compute, storage, and network) who can hack full time on your bastardized version of OpenStack. A team that looks something like this:
So, ultimately, you’re going to have to pick a vendor to bet on if you want enterprise-grade OpenStack-powered cloud solutions.
Hybrid is the new cloud. Most customers we talk to acknowledge the reality of needing to provide their developers with the best tool for the job. Needs vary, requirements vary, concerns vary, compliance varies. Every enterprise is a bit unique. Some need to start on public cloud, but then move to private cloud over time. Some need to start on private, but slowly adopt public. Some start on both simultaneously. RightScale’s recent State of the Cloud 2014 report has some great survey data backing this up:
Let’s talk about why your enterprise-grade OpenStack-powered cloud operating system vendor had better have a great hybrid cloud story.
Every enterprise needs a hybrid-first cloud strategy. Meaning, hybrid cloud should be your first and primary requirement. Then, plan around hybrid with a single unified governance model that delivers the best of both world’s for your constituencies. Finally, plan on a process where you will triage your apps/needs and determine which cloud is right for the job. The following diagram highlights this process, but your mileage may vary as criteria are different from business to business:
I have been through quite a number of interoperability efforts, the most painful of which was IPSEC for VPNs. Interoperability between vendors is not free, usually takes a fairly serious effort, and ultimately is worth the pain. Unfortunately, interoperability is deeply misunderstood, particularly as it applies to public/private/hybrid cloud.
The challenge in hybrid cloud is about addressing the issues of application portability. If you want a combination of public and private clouds (hybrid) where an application can be deployed on either cloud, moved between the clouds, or cloudbursted from one cloud to another, then application portability is required. When you pick up and move an app and it’s cloud native automation framework from one cloud to another, a number of key things need to remain the same:
Here is a slide I used in a recent webinar to help explain these requirements.
Of course, you must also have been thoughtful when designing your application and avoided any lock-in features of a particular cloud system, such as a reliance on DynamoDB on AWS, HA/DRS on VMware, iRules on F5 load balancers, etc.
If you don’t meet these requirements, interoperability is not possible and application portability attempts will fail. The application performance will be dramatically different and one cloud will be favored; there will be missing features that cause the app to not function on one cloud or another; and your automation framework may fail if behavioral compatibility doesn’t exist. For example, perhaps it has timers in it that assume a VM comes up in 30 minutes, but on one of your clouds it takes 1-2 hours (I’ve seen this happen).
All of these issues must be addressed in order to achieve hybrid cloud interoperability.
The Linux kernel needs a reference architecture. In fact, each major distribution of Linux in essence creates it’s own reference architecture and now we have distinct flavors of Linux OS. For example, there are the RedHat/Fedora/CentOS flavors and the Debian/Ubuntu flavors. These complete x86 operating systems have fully-baked reference architectures and anyone moving within one of the groups will find it relatively trivial to move between them. Whereas a RedHat admin moving to Debian may initially be lost until they come up to speed on the differences. OpenStack is no different.
OpenStack, and in fact most of its open source brethren, has no reference architecture. OpenStack is really the kernel for a cloud operating system. This is actually its strength and weakness. The same holds for Linux. You can get Cray Linux for a supercomputer and you can get Android for an embedded ARM device. Both are Linux, yet both have radically different architectures, making application portability impossible. OpenStack is similar, in that to date most OpenStack clouds are not interoperable, because each has its own reference architecture. Every cloud with its own reference architecture is doomed to be an OpenSnowFlake.
Enterprise-grade cloud operating systems powered by OpenStack must have commonly held reference architectures. That way you can be assured that every deployment is interoperable with every other deployment. These reference architectures have yet to arise. However, given that there is already a single reference architecture in Amazon Web Services (AWS) and Google Cloud Platform (GCP), (we call it “elastic cloud reference architecture”) and given that these two companies will be major forces in public cloud, it’s hard to see how OpenStack can avoid supporting at least one reference architecture that looks like the AWS/GCP model.
To be clear, however, there may be a number of winning reference architectures. I see emerging flavors in high performance computing (HPC) and possibly other verticals like oil & gas.
Ultimately, you have to place your own bet on where you think OpenStack lands, but existing data says that out of the top 10 public clouds, only a couple are based on OpenStack:
If enterprises desire agility, flexibility, and choice, it seems obvious that OpenStack needs to support an enterprise-grade reference architecture that is focused on building hybrid clouds with the ultimate winners in public cloud. It’s just my opinion, but right now that looks like Amazon, Google, and Microsoft.
Enterprise-grade OpenStack means an enterprise-grade reference architecture that enables hybrid cloud interoperability and application portability.
An open architecture designed for hybrid cloud interoperability is a foregone conclusion at this point. Mostly what folks argue about is how that will be achieved, but for those of us who are pragmatists, it’s certain that public cloud will have a wide variety of winners and that the top 10 public clouds is already dominated by non-OpenStack contenders. So plan accordingly.
Most importantly, remember to ask for an open architecture, while expecting an enterprise product.
In the next installment we’ll tackle what it means to deliver a performant, scalable, and elastic infrastructure designed for next gen apps.
UPDATE: Added a clarifying footnote due to some Twitter feedback that seemed unclear on what a “cloud operating system” was and it’s relationship to OpenStack and similar open source projects.
 No, Eucalyptus, OpenNebula, CloudStack, <insert your cloud software du jour>, are NOT complete cloud operating systems. They are all roughly at parity with OpenStack, although certainly you could argue that one is above or below the others. Why aren’t they complete? That’s a whole other blog posting series, but suffice it to say that when is the last time you saw an operating system that couldn’t install itself on bare metal? Or didn’t provide system metrics and logging capabilities? Or was missing key components (e.g. databases). A cloud operating system is a non-trivial task and most of these tools have simply handled the easy part of a cloud: placing a VM on a hypervisor (big whoop).
 By definition any cloud native next gen app manages itself via an automation framework. It might be a low level approach like Chef, Puppet, SaltStack; it might be a higher order abstraction like Scalr, RightScale, Dell Cloud Manager; it might even be a PaaS framework, but it’s *something*. Or it’s not a cloud-native app.
 Be sure to read the caveats on the VMware vCHS data in the actual report itself.Posted in OpenStack | Leave a comment
OpenStack is an amazing foundation for building an enterprise-grade private cloud. The great OpenStack promise is to be the cloud operating system kernel of a new generation. Unfortunately, OpenStack is not a complete cloud operating system, and while it might become one over time, it’s probably best to look at OpenStack as a kernel, not an OS. 
In order to become widely adopted by the enterprise, OpenStack must ultimately be delivered via robust, enterprise-grade products that close the gap on the key areas where OpenStack has challenges. These products are delivered today by businesses that can provide support, ease-of-installation, tools for day-to-day management, and all of the other pieces necessary for achieving acceptance. Without these vendors who have a stake in enterprise adoption, OpenStack can never be widely adopted. OpenStack isn’t MySQL. It’s the Linux kernel, and like the Linux kernel, you need a complete operating system to create success.
So what’s required? There are 6 key elements:
If your business requires an enterprise-ready OpenStack solution, read on to learn more about what a true enterprise-grade OpenStack-powered private cloud can – and should – offer. Over the next two weeks, I’m going to do a multi-part blog series entitled “6 Requirements of Enterprise-grade OpenStack” – where I will detail these six requirements.
To get started, lets look at OpenStack’s place in the enterprise.
Agility is the new watchword for cloud and DevOps is seen as the path to realizing agility. OpenStack provides the ideal platform for delivering a new developer experience inside the enterprise, just as Linux provided a new experience for web applications and Internet adoption. If OpenStack was just a “cheaper VMware,” then it would have little or no real value to the enterprise. Instead, OpenStack provides a shining example of how to build a private elastic cloud like major public clouds such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). Just as Hadoop brought Google’s MapReduce (plus it’s reference architecture) to the masses, OpenStack brings the AWS/GCP-style Infrastructure-as-a-Service (IaaS) offering to everyone. This is what makes DevOps inside the enterprise ultimately shine.
Any discussion about DevOps, like so many of the recent buzzwords, can quickly become mired in semantic arguments. However, the one truism we can all agree on is that the traditional barriers between application developers and IT infrastructure operators need to be broken.
Time after time, I hear a similar story from our customers that goes like this: “We went to the infrastructure teams with our long list of requirements for our new application. They told us it would take 18 months and $10M before it would be ready. So we went to Amazon Web Services. We didn’t get our list of infrastructure requirements and we had to change our application model, but we got to market immediately.” That’s because the inherent value of AWS has less to do with cost and more to do with the on-demand, elastic and developer-centric delivery model.
OpenStack levels the playing field inside the enterprise. Private clouds can be built on the public cloud model, enabling developers while simultaneously giving centralized IT control and governance. In essence, it’s the best of both worlds, which is the true value of OpenStack-powered private clouds.
While I think it’s self-evident that agility is the driving light behind cloud computing, it’s worth a quick refresh. The need for businesses to move now has driven ridiculous growth for AWS (see growth below and notice this is a log chart):
This growth is all net new applications, or what Microsoft calls next generation applications. The vast majority of these new applications are focused on creating entirely new business value, typically around mobile, social, web applications, and big data. In fact, this category of application is growing so fast that analysts such as IDC and Gartner have started tracking it :
At its current rate of growth, next generation cloud applications will equal the size of all existing applications by 2018:
Next generation applications are the source for future competitiveness for most enterprises, which has led them to accelerate their cloud adoption process and rethink their cloud strategy.
Observing this phenomenon is what led Forrester analyst Craig Le Clair to say:
Seventy percent of the companies that were on the Fortune 1000 list a mere 10 years ago have now vanished – unable to adapt to change …
We have now entered an adapt or die moment for enterprises, and OpenStack will be key to agility adaptation and the successful support of DevOps.
Over the next few weeks leading up to the OpenStack Summit I’m going to cover all 6 Requirements of Enterprise-grade OpenStack in detail. Today I am going to handle the first two requirements: high uptime APIs and robust management of your cloud.
Continuing our discussion around delivering enterprise-grade OpenStack, let’s discuss how critical API availability and scaling out the cloud control plane are to delivering next gen applications.
A critical capability for moving to a new cloud and devops model is the ability of cloud native applications to route around failures in an elastic cloud. These applications know that any given server, disk drive, or network device could fail at any time. They look for these failures and handle them in real-time. That’s how Amazon Web Services (AWS) and Google Cloud Platform (GCP) work and why they can run these services at a low cost structure and with greater flexibility. For an application to adapt in real-time to the normal failures of individual components, the cloud APIs must have higher than normal availability.
API uptime isn’t the only measurement of success. Your cloud’s control plane throughput is also critical. Think of the control plane as the command center of your cloud. It is most of the centralized intelligence and orchestration layer. Your APIs are a subset of the control plane, which for OpenStack also includes all of core OpenStack projects, your day-to-day cloud management software (usually part of a vendor’s Enterprise-grade OpenStack distribution), and all of the ancillary services required such as databases, OpenStack vendor plugins, etc. Your cloud control plane needs to scale-out as your cloud grows bigger. That means that in aggregate, you have more total throughput for API operations (object push/pull, image upload/downloads, metadata updates, etc.).
This is where a proper cloud operating system comes in.
In essence, by saying that you can build a four or five 9 app (99.99-99.999%) on a two and a half 9 infrastructure (99.5%), the API that app manages must also have a four or five 9 uptime (99.999%). As most of you know, delivering five 9s of availability is a non-trivial task, as this is only 5.26 minutes of unplanned downtime allowed per year. Typical high availability approaches, such as active/passive or master election systems, can take several minutes to failover, leaving your cloud API endpoints unavailable.
An enterprise-grade cloud operating system can provide guarantees of sub-minute or even sub-second failover and deliver 99.999% or possibly even 99.9999% (that’s six 9s or 31.5 seconds of downtime per year) uptime. This kind of design is achievable at a relatively low price point using classic load balancing style techniques where your control plane and APIs are running active/active/active/active/… to N where N is however many you need as your cloud grows:
Which brings me to the second part of the equation: you need your control plane to grow as your cloud grows. You don’t want to re-architect your system as it grows, and you don’t want to resort to old school scale up techniques for your API endpoints. When you run active/passive or with a master election system for high availability, only one API endpoint is available at a time. That means that you are fundamentally bottlenecked by the total throughput of a single server, which is unacceptable in today’s scale-out cloud world.
Instead, use a load balancing pattern so you can run multi-master (N-way) active API endpoints, scale your control plane horizontally and simultaneously achieve a very high uptime. This is the best of all worlds, allowing your cloud native applications to route around problems in real-time.
Now let’s talk about day-to-day management of and securing your cloud.
You probably know this already, but building a robust, manageable, and secure infrastructure in the enterprise isn’t easy. The notion that an enterprise-grade private cloud can be delivered in an afternoon and in production that evening doesn’t wash with the realities of the datacenter. Still, time is of the essence and if you want a cloud that doesn’t suck and you want it (relatively) fast, then it will help if the version of OpenStack you choose has been designed with deployment, daily management, and security in mind. Let’s take a deeper look at what that entails.
Installation is only the beginning when it comes to managing OpenStack. A true cloud operating system provides a suite of operator-centric cloud management tools designed to allow the infrastructure team to be successful at service delivery. These management tools provide:
Many attempts to solve the private cloud management challenges stop at installation. Installation is just the beginning of your journey, and how easy it is doesn’t matter if your cloud is then hard to manage on a daily basis. As we all know, running a production system is not easy. In fact, private clouds are significantly more complex than traditional infrastructure approaches in many aspects. To simplify the issue, at scale, the cloud pioneers, such as Google, Amazon, and Facebook have all adopted a pod, cluster, or block based approach to designing, deploying, and managing their cloud. Google has clusters; Facebook has triplets; but it’s all ultimately the same: a Lego brick-like repeatable approach to building your cloud and datacenter. Enterprise-grade OpenStack-powered cloud operating systems will need to provide a similar approach to cloud organization.
Once the cloud is up and running, cloud operators need a variety of tools to maintain the cloud on a regular basis, including event logs and system metrics. Sure, in an elastic cloud events that used to be critical (e.g. server or switch failure) are no longer high priority. However, your cloud can’t be a black box. You need information on how it’s operating on a daily basis so you can troubleshoot specific issues as required and most importantly keep an eye out for recurring issues using correlation tools. An individual server failure might not be a problem, but any kind of common issue that is effective large amounts of resources needs to be sought out and quickly addressed.
What is your cloud doing? Not only do you need to know, but your other tools and groups may need to know as well. Integration to existing systems is critical to broad adoption. Any complete solution will have an API and command line interface (CLI) to allow you integrate and automate. A CLI and API for just OpenStack administrative needs is not enough. What about your physical server plant or management of your blocks or pods? How about being able to retrieve system metrics and logging data on demand from not only OpenStack, but Linux and other non-OpenStack applications? You need a single, unified interface for cloud operations and management. Obviously, if you have this API, a GUI should also be provided for those unique cloud operator tasks that require visualizations such as looking for patterns in system and network metrics.
Cloud turns the security model on its head. A complete discussion of this topic is far beyond the scope of this blog, but I do know one thing: enterprises want a private cloud with an understandable security model, particularly for the control plane. As I covered in the previous installment of this series, your cloud control plane’s API uptime and throughput is critical to allowing next generation applications to route around trouble. Similarly, the security of your cloud’s control plane should not be taken for granted.
It can be easy to get caught up in the move towards a decentralized model, but decentralized and scale-out are not the same thing. You can actually mix centralization and scale-out techniques and this is the default model that cloud pioneer Google uses. Keeping your cloud control plane in one place allows you to:
This last item is perhaps most important. You don’t want your OpenStack database to reside on the same storage system as your virtual machines. What if someone breaks into a VM through the hypervisor? Or, conversely, what happens if someone breaks into the control plane via an API?
Best practices in the enterprise have long comprised an approach of zoning (usually with VLANs) of different components into different security areas with differing policies applied. Zoning slows an attacker down, gives you time to detect them, and to respond. Being able to take a similar approach to your private cloud security model is vital to making certain your cloud is secure.
As I said, your journey begins with the installation of the cloud. After that, you need a set of tools and a security model that allows you to confidently manage your cloud day by day. An Enterprise-grade, OpenStack-powered cloud operating system should deliver as much of these capabilities as possible.
OpenStack is a strong foundation for building a next generation private cloud designed for next generation cloud applications. Unfortunately, it isn’t a complete cloud operating system and you will need a partner to provide you with that solution. This series is covering The 6 Requirements of Enterprise-grade OpenStack and in today’s blog posting I covered the need for a high uptime, scale-out control plane and robust, secure management tools.
In the next installment I will cover building around an open architecture and reducing vendor lock-in. That will be followed by the closing posting covering the need for filling in the gaps around scalability and performance and choosing a partner who can provide global services & support.
UPDATE: Added references for the growth of new net new apps.
 In fact, even the OpenStack Foundation is about to refresh it’s messaging to help clarify this.
 The original source for this is EMC, both at this blog posting and via a private presentation by Joe Tucci, CEO of EMC, at a Research Board event.
 Yes, centralized. Something can be centralized and still scale-out. Centralization is necessary for proper security policies and zones to be enforced.
 Google’s cluster architecture has quite a bit of detail here.Posted in OpenStack | Leave a comment
I can’t believe that it’s less than a month before the the upcoming Juno Summit. As you start putting together your plans for the Summit, I wanted to highlight some items to look forward to from the Cloudscaling team.
#1: Four Cloudscaling sessions have been selected for the Summit.
We appreciate the explicit endorsement from the community – the topics reflect on our experience and leadership in the space – and we are very excited to share! Here is a recap of the four Cloudscaling sessions – you can simply add them to your Summit schedule by clicking on the links:
Hybrid Cloud Landmines: Architecting Apps to Avoid Problems (Randy Bias & Drew Smith): Application portability? Cross-cloud replication and bursting? These are hard issues with serious implications that we’ll examine in detail, digging into the subtleties between simply implementing a cross-cloud API and actually using it to get real work done. We’ll provide a set of recommendations on how to architect your application from the ground up to be hybrid cloud friendly.
Open Cloud System: OpenStack Architected Like AWS (Randy Bias): At Cloudscaling, we spend a great deal of effort operationalizing OpenStack for customers and ensuring it’s compatible with leading public clouds. In this session, we’ll detail the configuration and design decisions we make in real world OpenStack deployments.
OpenStack Scale-out High Availability: Scaling to 1,000+ servers without Neutron (Randy Bias, Abhishek Chandra & JC Smith): The default OpenStack networking models don’t make sense in a modern datacenter. Similar to the approach taken by HP, CERN and other deployments, we’ll detail how we replace the default networking model through standard plugins in order to deploy a fully scale-out networking model that can support 100s of racks with incredibly high performance.
OpenStack Block Storage: Performance vs. Reliability (Randy Bias & Joseph Glanville): In this presentation, we’ll talk about selecting and designing the right kind of storage for your use case. We’ll also walk you through how we built our block storage solution in Open Cloud System, its design principles, and explain why it is a better version of AWS Elastic Block Storage (EBS) in several key ways. (I regret we have to pull the plug on this presentation due to time constraints! Hopefully in fall. Best, –Randy)
#2: For the second Summit in a row, we are sharing our booth space with Quanta.
The reason is simple – a significant majority of Cloudscaling customers use Quanta servers and storage equipment for their OpenStack powered clouds (and did I mentioned what such a great team they are to work with?). While OpenStack’s role is to ultimately abstract the underlying physical infrastructure, we have found Quanta hardware to be a perfect complement to elastic OpenStack-powered clouds. The Quanta team will be bringing a few racks of their datacenter products that are most optimized for building modern OpenStack clouds. Stop by our booth!
#3:Open Cloud System (OCS) product announcement.
OK, I know it’s a teaser. But it should be no surprise that Icehouse will be central to the OCS announcement but we have a few additional items up our sleeves to share including
New OCS management capabilities
Additional AWS compatible features
So, it’s an understatement to say that we will be busy between now and the Summit. But any opportunity to meaningfully interact with the community, our customers and our partners is worth its weight in gold. We look forward to seeing you in Atlanta!Posted in OpenStack | Leave a comment
Today at Interop in Las Vegas, Cloudscaling introduced our new Cloud Concierge Services. The idea for Cloud Concierge came about as we watched enterprises struggle to understand where to start implementing an OpenStack cloud solution. Between public clouds, private clouds and legacy technology, there were many interrelated and complex decisions companies have to make.
Cloud Concierge aims to reduce complexities and accelerate adoption of enterprise OpenStack cloud implementations. We achieve this by:
Building all-inclusive offerings catered to fixed budgets and timeframes,
Grouping the Cloud Concierge offerings in logical phases of OpenStack adoption,
Addressing the needs of key stakeholders in the enterprise, such as the infrastructure and application teams, who are transforming IT.
More importantly, our Cloud Concierge Solution Architects will train IT professionals on new skills and guide them to insights needed to operate in a modern cloud era. In addition to getting the IT team started in the short-term, they’ll gain new learnings that will allow them to better serve their organizations for the long-term.
Adopting a cloud strategy is a must in today’s business world. An OpenStack decision will give companies greater agility, transform IT service delivery and create differentiated business value. Cloud Concierge will help you go from “Zero to Cloud” on time and on budget.
Get in touch, and let us know how we can help you get started. Learn more about our Cloud Concierge Services at www.cloudscaling.com/services/cloud-concierge-services/.
Posted in OpenStack | Leave a comment
This week, Google hosted its Cloud Platform Live event. Some people were a little surprised at my enthusiastic live twitter coverage for a number of Google’s announcements. I have been waiting for them to “go big or go home” for a while now. My biggest surprise was how long it took Google to get up to speed. My fundamental belief is that they waited so long because they realized what they needed to do to succeed had little to do with initial speed, but was more about long term momentum.
Let me explain.
There are Three Ways to be #1
I wish I could claim this is my original thought, but I heard it from someone else. If you know a more canonical source, I would very much like to have it. Please send it along via twitter or comments.
The saying goes like this:
There are three ways to be #1 in a market:
Ideally, you would be two or more of these things, leading to market dominance.
If you were Google, entering into this market years after AWS has launched and gained the market traction they have, you might reflect on what your best strategy is. Maybe you know the above truism or maybe you just have good instincts. If you are playing to win and you can’t be first that means you need to be best or cheapest, but preferably both.
And that, in essence, I think boils down why Google took a while to get “in market” and really tried to nail the technology before getting very serious as we saw this week.
Public Cloud is A Development Engine Game
I have always thought that public cloud was less about technical features (e.g. VMs on demand, object storage, etc.) and more about building a world class development engine. As I noted on previous occasions, businesses like Amazon actually increase in feature velocity as they get bigger, not decrease. This is a relatively new phenomena only seen in web-scale businesses (and Apple).
Building velocity like this is less about technology and more about culture and organizational structure. I’m sure you have read the seminal The Mythical Man Month, which essentially says that as a development team gets larger your overhead on communication increases to the point where you actually move slower. The answer to this is typically moving to an agile model, but that’s only a partial fix. It is far more effective to make your development organization look like a set of loosely coupled independent startups with clear targets for success and clear accountability. In that way teams run their own business.
This also maps to the actual underlying technology structure of the business and is the general “a-ha” moment that large scale web businesses collectively had. Here’s a 2007 ZDnet article talking about Amazon.com’s (the retail site) services oriented architecture.
Google Cloud Platform’s Announcements
Google thinks this is a game for the hearts and minds of developers of new apps
Google is doubling down on it’s own development engineering might first
This is reflected in the Wired article that came out yesterday entitled Google’s Bold Plan to Overthrow Amazon as King of Cloud.
The key quote is here:
“We will spend the majority of our development efforts on this New World,” wrote [Urs] Hölzle. “Every developer will want to live in this world…and it’s our job to build it.”
More succinctly what Urs is saying is that Google’s strategy is to build a developer advantage inside Google in order to enable developers outside of Google to have cloud services (platform and infrastructure) that they will love.
Hints of this can be found in the more nuanced technical announcements yesterday such as the support of integrated build/test pipelines through a combination of Google Compute Engine (GCE) + Google App Engine (GAE) using the GAE Managed VMs offering.
Being Best AND Cheapest
Google’s approach to their Cloud Platform has been the slow and steady buildup of a hard-hitting, quick-firing development engine that is capable of increasing the velocity of feature releases over time. That development engine has now reached escape velocity. There are only two other public clouds like this: Amazon Web Services and Microsoft Azure .
Google’s development engine shows they are targeting being Best AND Cheapest. There are a number of good reasons why they can accomplish those goals and perhaps I will deep dive into them in a future blog posting.
Consider this … If it’s a three way horse race in public cloud with OpenStack for the private cloud then we need to accept that we are now living in a multi-cloud world. For me, this is a sure indicator of a rapidly maturing marketplace that delivers maximum choice to developers and enterprises.
To paraphrase: Cloud just got real.
 Marc Andreesen I think, but I can’t find the blog posting where I thought he laid it out. BTW, if you haven’t read Marc’s original blog postings on entrepreneurism and startups, you are really missing out. The archive is here.
 HP might develop this capability over time, but they need to seriously commit to the path, so there are significant question marks about their possibilities for success until their business is more stable.
Posted in Cloud Computing | Leave a comment
Last night I presented at the Chicago DevOps Meetup along with @mattokeefe and @mfdii. The presentation went very well and was warmly received. It’s a bit of a revisit of some topics I have covered before, such as Pets vs. Cattle, the new kind of elastic cloud, and the architectural patterns involved with building these new kinds of clouds. I did a major refresh though and am weaving the threads a bit differently than before.
I hope you enjoy it.Cloud Computing | Leave a comment
← Older posts