When the Cloud Crashes
What?! The cloud could go down?! Never, impossible!
Regardless of how big, how small, or how well-known, every cloud solution has downtime. There is a difference, however, between planned downtime versus unplanned downtime (otherwise known as an outage).
Planned downtime is part of every cloud provider’s standard terms and is the time when the cloud provider performs maintenance on their cloud. Remember, the cloud is a bunch of computer hardware and software in datacenters. That hardware and software requires constant care and feeding.
Much of this maintenance can be done behind the scenes without affecting the actual uptime and accessibility of the cloud. But there are times when system maintenance, upgrades, etc. will require a brief outage to complete. These planned outages will typically be communicated to all of the cloud customers a week or two beforehand and state that the cloud will be unavailable for a planned period of time. Usually the planned downtime is scheduled for a few hours in the middle of the night. These planned outages can be inconvenient, but they are something you actually do want your cloud provider to do.
The more troublesome outage is the unplanned one that hits suddenly, in the middle of the day, with no warning. No company is immune to unplanned outages. Google, Yahoo!, AT&T, Microsoft — they have all had unplanned outages and will always have them.
Despite the millions of hours and dollars spent to avoid cloud outages, the environment is just too complex to remove them completely. There are countless layers always changing with components completely out of the control of the provider.
For example, a well-known cloud provider had an outage on the East Coast because an engineer with a large telecom provider made a change to a router on the West Coast. The engineer never knew or expected his change could create a ripple effect that brought the cloud provider down.
It Might Just Be You
When a business has multiple locations using the cloud, a single office may be “down,” but the outage may not be related to the actual cloud provider. The office that is down may have lost power, its Internet service may have been disrupted, or equipment in their office, like a firewall, may have failed. Any one of these things would cause that location to be down, but the cloud service is still available. In this scenario, users could be sent home to use their home Internet service to access the cloud.
When the actual cloud service is down, it can be down in a region of the country or world, or down completely. When this happens you can be sure the cloud vendor is scrambling first to get the services back online and then doing everything they can to find the root cause of the outage so they can prevent it from happening again. Sometimes, though, because of the complexity, the root cause is never found.
What to Do When Your Cloud Services Crash
If you are using cloud services and suspect you are “down” there are a few steps that should be taken to verify the scope of the problem.
- Are you able to access the internet and other websites from your location? If the answer is yes, then you know it is not just a problem with your Internet service and your cloud service may actually be down. If the answer is no, the cloud service may be running fine and it is only your location that is down.
- Determine if the cloud service is actually down or just slow. Many times a cloud service may actually be up, but running very slowly, which is important to know for the next step.
- Contact the cloud provider by phone or email (they should have a help desk process for submitting tickets and support requests) and ask if they are having any issues with their service. Give them specifics about your issue, like whether your cloud service is completely down or just slow.
- Ask the provider if they have an estimated time when the service will be back to normal. Many times they will not know the answer and calling over and over will only divert them from working the issue. You can be certain they will be doing everything possible to get things back to normal as soon as possible.
- When the service is back online, ask the provider the reason for the outage. The provider may or may not have a specific reason, but it is always good to ask.
- Check the Service Level Agreement (SLA) you signed with the cloud provider. Many times the SLA will guarantee a certain level of uptime, for example 99.999%. If their service is down for longer than the SLA commitment, they may owe you a refund for the time it was down. Check the fine print.
- If outages persist over time, start looking for another provider.
All cloud services will go down from time to time. Although very annoying and a legitimate negative impact to the business, outages should be expected, but rare. No system, whether owned yourself or in the cloud is immune completely from downtime. The key is to have a cloud provider that addresses any issue quickly, communicates often and has ready access to support. Waiting three days to get a response from a cloud provider should not be acceptable.