The Open Enterprise Cloud – OpenStack’s Holy Grail?

The way people think about the enterprise IT is changing fast, putting into question many common assumptions on how hardware and software should be designed and deployed. The upending of these long held tenets of Enterprise IT are happening simply due to the innovation brought on by OpenStack and a handful of other successful open source projects that have gained traction in recent years.

What is still unclear is how to deliver all this innovation in a form that can be consumed by customers’ IT departments without the need to hire an army of experienced DevOps, itself as notoriously hard to find as unicorns commodity that has a non-trivial impact on the TCO.

The complexity of an OpenStack deployment is not just perception or FUD spread by the unhappy competition. It’s a real problem that is sometimes ignored by those deeply involved in OpenStack and its core community. The industry is clearly waiting for the solution that can “package” OpenStack in a way that hides the inherent complexity of this problem domain and “just works”. They want something that provides user-friendly interfaces and management tools instead of requiring countless hours of troubleshooting.

This blog post is the result of our attempt to find and successfully productize this ‘Holy Grail’, featuring a mixture of open source projects that we actively develop and contribute to (OpenStack, Open vSwitch, Juju, MAAS, Open Compute) alongside Microsoft technologies such as Hyper-V that we integrate into Openstack and that are widely used in the enterprise world.

We are excited to be able to demonstrate this convergence of all the above technologies at our Cloudbase Solutions booth at the Vancouver Summit, where we shall be hosting an Open Compute OCS chassis demo.

Objectives

Here are the prerequisites we identified for this product:

Full automation, from the bare metal to the applications
Open source technologies and standards
Windows/Hyper-V support, a requirement in the enterprise
Software Defined Networking (SDN) and Network Function Virtualization (NFV)
Scalable and inexpensive storage
Modern cloud optimized hardware compatible with existing infrastructures
Easy monitoring and troubleshooting tools
User friendly experience

Hardware

Let’s start from the bottom of the stack. The way in which server hardware has been designed and produced didn’t really change much in the last decade. But when the Open Compute Project kicked off it introduced a set of radical innovations from large corporations running massive clouds like Facebook.

Private and public clouds have requirements that differ significantly from what traditional server OEMs keep on offering over and over again. In particular, cloud infrastructures don’t require many of the features that you can find on commodity servers. Cloud servers don’t need complex BMCs beyond basic power actions and diagnostics (who needs a graphical console on a server anymore?) or too many redundant components (the server blade itself is the new unit of failure) or even fancy bezels.

Microsoft’s Open CloudServer (OCS) design, contributed to the Open Compute Project, is a great example. It offers a half rack unit blade design with a separate chassis manager in a 19” chassis with redundant PSUs, perfectly compatible with any traditional server room, unlike for example other earlier 21” Open Compute Project server designs. The total cost of ownership (TCO) for this hardware is significantly lower compared to traditional alternatives, which makes this is a very big incentive even for companies less prone to changes in how they handle their IT infrastructure.

Being open source, OCS designs can be produced by anyone, but this is an effort that only the larger hardware manufactures can effectively handle. Quanta in particular is investing actively in this space, with a product range that includes the OCS chassis on display at our Vancouver Summit booth.

Storage

“The Storage Area Network (SAN) is dead.” This is something that we keep hearing and if it’s experiencing a long twilight it’s because vendors are still enjoying the profit margins it offers. SANs used to provide specialized hardware and software that has now moved to commodity hardware and operating systems. This move offers scalable and fault tolerant options such as Ceph or the SMB3 based Windows Scale-Out File Server, both employed in our solution.

The OCS chassis offers a convenient way of storing SAS, SATA or SSD storage in the form of “Just a Bunch of Disks” (JBOD) units that can be deployed alongside regular compute blades having the same form factor. Depending on the requirements, a mixture of typically inexpensive mechanical disks can be mixed with fast SSD units.

Bare metal deployment

There are still organizations and individuals out there who consider that the only way to install an operating system consists in connecting monitor, keyboard and mouse to a server, insert a DVD, configure it interactively and wait until it’s installed. In a cloud, regardless of being private or public, there are dozens, hundreds or thousands of servers to deploy at once, so manual deployments do not work. Besides this, we need all those servers to be consistently configured, without the unavoidable human errors that manual deployments always incur at scale.

That’s where the need for automated bare metal deployment comes in.

We chose two distinct projects for bare metal: MAAS and Ironic. We use MAAS (to which we contributed Windows support and imaging tools), to bootstrap the chassis, deploy OpenStack using Juju, including storage and KVM or Hyper-V compute nodes. The user can freely decide any time to redistribute the nodes among the individual roles, depending on how many compute or storage resources are needed.

We recently contributed support for the OCS chassis manager in Ironic, so users have also the choice to use Ironic in standalone mode or as part of an OpenStack deployment to deploy physical nodes.

The initial fully automated chassis deployment can be performed from any laptop, server or “jump box” connected to the chassis’ network without the need of installing anything. Even a USB stick with a copy of our v-magine tool is enough.

OpenStack

There are quite a few contenders in the IaaS cloud software arena, but none managed to generate as much interest as OpenStack, with almost all relevant names in the industry investing in its foundation and development.

There’s not much to say here that hasn’t been said elsewhere. OpenStack is becoming the de facto standard in private clouds, with companies like Canonical, RackSpace and HP basing their public cloud offerings on OpenStack as well.

OpenStack’s compute project, Nova, supports a wide range of hypervisors that can be employed in parallel on a single cloud deployment. Given the enterprise-oriented nature of this project, we opted for two hypervisors: KVM, which is the current standard in OpenStack, and Hyper-V, the Microsoft hypervisor (available free of charge). This is not a surprise as we have contributed and are actively developing all the relevant Windows and Hyper-V support in OpenStack in direct coordination with Microsoft Corporation.

The most common use case for this dual hypervisor deployment consists in hosting Linux instances on KVM, and Windows ones on Hyper-V. KVM support for Windows is notoriously shaky, while Windows Hyper-V components are already integrated in the OS and the platform is fully supported by Microsoft, making it a perfect choice for Windows. On the Linux side, while any modern Linux works perfectly fine on Hyper-V thanks to the Linux Integration Services (LIS) included in the upstream Linux kernel, KVM is still preferred by most users.

Software defined networking

Networking has enjoyed a large amount of innovation in recent years, especially in the areas of configuration and multi tenancy. Open vSwitch (OVS) is by far the leader in this domain, commonly identified as software defined networking (SDN). We recently ported OVS to Hyper-V, allowing the integration of Hyper-V in multi-hypervisor clouds and VXLAN as a common overlay standard.

Neutron includes also support for Windows specific SDN for both VLAN and NVGRE overlays in the ML2 plugin, which allows seamless integration with other solutions, including OVS.

Physical switches and open networking

Modern managed network switches provide computing resources that were simply unthinkable just a few years ago and today they’re able to natively run operating systems traditionally limited to server hardware.

Cumulus Linux, a network operating system for bare metal switches developed by Cumulus Networks, is a Linux distribution with hardware acceleration of switching and routing functions. The NOS seamlessly integrates with the host-based Open vSwitch and Hyper-V networking features outlined above.

Neutron takes care of orchestrating hosts and networking switches, allowing a high degree of flexibility, security and performance which become particularly critical when the size of the deployment increases.

Deploying OpenStack with Juju

One of the reasons for OpenStack’s success lies in its flexibility: the ability to support a very large amount of hypervisors, backend technologies, SDN solutions and so on. Most of the medium and large enterprise IT departments already adopted some of those technologies and want OpenStack to employ them, with the result that there’s not a single “recommended” way to deploy your stack.

Automation, probably the leading glue in all modern datacenter technologies, doesn’t play that well with flexibility: the higher the flexibility, the higher the amount of automation code that needs to be written and tested, requiring often very complex deployments that become soon unfeasible for any continuous integration framework.

Puppet, Chef, SaltStack and similar configuration management tools are very useful when it comes to automating a specific scenario, but are not particularly suitable for generic use cases, unless you add on top tools like RDO’s PackStack to orchestrate them. Finally, while command line tools are the bread-and-butter of every DevOp, they don’t do much to bring a user-friendly experience that a more general user base can successfully employ without having to resort to cloud specialists.

When looking for a suitable deployment and configuration solution, we recognized that Juju was fulfilling most of our requirements, with the exception of Windows and CentOS support which we contributed shortly afterwards. What we liked in particular is the strict decoupling between independent configurations (called Charms), and a killer GUI that makes this brave new automation world more accessible to less experienced users.

This model has the potential for a large impact on the usage spectrum, productivity improvement and the general TCO reduction. Furthermore, Juju offers also a wide and fast growing catalog of applications.

Applications and orchestration

People want applications, not virtual machines or containers. IaaS is nice to have, but what you do on top of it is what matters for most users. Juju comes to the rescue in this case as well, with a rich charms catalog. Additionally, we developed Windows specific charms to support all the main Microsoft related workloads: Active Directory, IIS, VDI, Windows Server Failover Clustering, SQL Server (including AlwaysOn), Exchange and SharePoint.

Besides Juju, we support Heat (providing many Heat templates for Windows, for example) and PaaS solutions like Cloud Foundry that can be easily deployed via Juju on top of OpenStack.

Cattle and Pets

Using the famous cattle vs pets analogy (a simplistic metaphor for what belongs to a cloud and what doesn’t), OpenStack is all about cattle. At the same time, a lot of enterprise workloads are definitely pets, so how can we create a product that serves both cases?

An easy way to distinguish pets and cattle is that pets are not disposable and require fault tolerant features at the host level, while cattle instances are individually disposable. Nova, OpenStack’s compute project, does not support pets, which means that failover cluster features are not available natively.

We solved this issue by adding one extra component that integrates Nova with the Microsoft Windows Failover Clustering when using Hyper-V. Other components, including storage and networking, are already fully redundant and fault tolerant, so this additional feature allows us to provide proper transparent support for pets without changes in the way the user manages instances in OpenStack. Cattle keep grazing unaffected.

Conclusions

Finding a reliable way to deploy OpenStack and managing it in all its inherent complexity with a straightforward and simple user experience is the ‘Holy Grail’ of today’s private cloud business. At Cloudbase Solutions, we believe we have succeeded in this ever elusive quest for simplicity of user experience by consolidating the leading open source technologies for setting up the bare metal right on to the top of the stack applications, including support for Enterprise Windows and Open source workloads deployed with Canonical’s Juju, and all this in perfect harmony.

The advantage for the user is straightforward: an easy, reliable and affordable way to deploy a private cloud, avoiding vendor lock-in, supporting Microsoft and Linux Enterprise workloads and bypassing the need for an expensive DevOps team on payroll.

Want to see a live demo? Come to our booth at the OpenStack Summit in Vancouver on May 18th-22nd!