If you have business critical services online, high availability is a must. But what do we mean by availability? In very straightforward terms (and that is what this article will be dealing in), high availability in IT refers to a system or component that’s always operational over an extended period of time.
Within a hosted environment this works by having backups available for each piece of physical kit in the technology stack, or when thinking about virtual assets, the software in place to provide a failover mechanism if something goes wrong.
So, if a server or switch or disk fails within your hosting infrastructure stack, a backup version is available to automatically replace it and keep online services and applications up and running without significant interruption.
Having high availability is important for a number of reasons. As a business with a critical website or application online, it reflects badly on your brand if customers can’t access your services. More importantly, if customers cannot access your services, this can affect your revenue intake as purchases, transactions and processes cannot be made through your website or application.
Services being down is now directly impacting the business’s ability to generate revenue – effectively closing your business until your website or application can be brought back online. Customers are unhappy, calls into your help desk or customer services increase, and overall confidence in your brand is knocked.
Most people would agree that now you have an understanding of what high availability is, it is a must-have for organisations with business critical online services.
Learn the Lingo: Our Super-quick, Essential Glossary
You don’t need to be technical to read on, however, it is worth getting a basic understanding of what people are talking about when it comes to high availability and online performance. That way, you can ensure you’ve got the solution you need for your business.
So let’s take time out to demystify some of the jargon before we go on.
> N + 1 – Firstly, the “N” refers to a component, and the +1 means the component has at least one independent backup component (+1). So one server working away in the hosting stack, with one backup unit ready to take over in the event that the first fails;
> Availability Uptime – this figure, usually expressed as a percentage like 99.9% or 99.99%, is the availability level that your provider is stating they will achieve across a given year. Contract SLA will define this figure and the resource available if this is not met (usually service credits of some kind).
Wikipedia provide a good explanation of what uptime percentages mean in terms of contractually permitted downtime: https://en.wikipedia.org/wiki/High_availability
The idea of high availability is that it prevents any disruption to your website or applications should any unavoidable (and inevitable) technical issues occur.
There are lots of different ways cloud hosting can be setup to achieve this, and a lot depends on the requirements of the applications running on that particular infrastructure or what your provider can offer.
You could have backup equipment available at every level of the hosting stack, either already running alongside as a pair or available to quickly boot up, if the primary equipment failed.
Some cloud infrastructure, like those offered by hyper-scale companies AWS and Azure, will require failover to be architected into the application to fire-up additional resources to act as backups in the event of failover being required.
Some setups will be active-active, with two entirely separate platforms running the application simultaneously in different data centre locations, with one acting as the ‘live’ platform and the second (with data replicating from the live platform in real-time so they are always the same) acting as an instant backup if required.
Some cloud platforms have additional high availability at a virtual machine (or VM) level, using virtualisation software to handle failover if it detects an issue within a virtual machine cluster.
The virtual machines are like software defined servers and if they develop a fault, just like a physical server, it could affect the applications running on them. If the virtualisation software on the cloud platform detects a potential issue, it deploys a new set of virtual machines, to failover the applications and systems and keep them running with only a minor (seconds) interruption.
For example, Secura’s VPC uses VMware vSphere High Availability to provide failover for all VMs on the platform. If it detects any problems, a 60 second failover kicks in to keep applications running and giving customers additional N+1 redundancy at the hypervisor level.
It’s probably worth starting off with the knowledge of the fact that not all cloud platforms provide high availability as standard, and it’s not always particularly clear that this is the case.
This is particularly true for the metered, self-service hyper-scaler resources. In most instances, as the customer you’ll have to pay for additional resource to provide high availability and plan for how you will initiate that failover process.
The same is true for dedicated platforms. Here, you’ll need physical hardware to remove any service disruption. This can be very costly, and you’ll have to factor in these additional hardware purchases when scoping out your platform architecture. As touched on in our previous article, “The Dedicated Dilemma”, you also have the added hassle of replacing physical units (or making provision with your provider to do so) if things go wrong, costing time and money.
Some service providers do offer high availability as standard (like Secura) because we take the availability of applications running on our cloud very seriously.
This extends from our cloud infrastructure, down through our data centres and network, which are all designed to offer high availability, with no single point of failure.
It’s well worth finding out what your provider or prospective provider offers in terms of high availability and what you need as a baseline for your application and business.
We don’t need to tell you that downtime can be extremely costly for any organisation that relies on their website or application for revenue or reputation. Poor website availability not only impacts online income but can damage brand perception – irritating customers and putting extra strain on your team as they have to pick up complaints.
For these reasons, it’s vital to ensure you have a robust, reliable system in place for keeping your website up and running. Selecting a cloud platform that’s highly available and includes a rapid failover process can save you time, money, and minimise any potential damage and disruption to your business and brand.
Find out whether your platform is highly available, and what the processes and timelines are for dealing with a failure. If it’s not, consider the options available with your current service provider or others to ensure that high availability is built into your hosting solution. That way you’ll be able to relax and concentrate on the important parts of running your business.
We hope this brief introduction to high availability gave you a basic outline of the principles and processes involved. To take availability to the next level you can look to introduce software level high availability that spans multiple cloud environments and data centres. This will be the subject of a future article – watch this space!
As Secura's CTO, Dan is responsible for the team that design, build and maintain our cutting edge cloud hosting infrastructure. He is also the dishwasher police - stack it or else.
Tweet me at:
@securacloud