In just the last two weeks, there were three major DNS outages between Google, Microsoft Azure, and Fonality.
But only one of these companies was able to make even bigger waves with the way they handled their blunder. Fonality, who sells VoIP services and business phone systems, offered a very rare and transparent analysis of their outage. In a detailed statement from Chief Marketing Officer Jeff Valentine, readers were given crucial insight on how to prevent the same mistakes from happening to other companies.
The four hour outage crippled nearly every aspect of their online business, from their website, status site, voice services, and even video chat. All thanks to a simple DNS error. After hours of investigating, the problem was rooted back to a single mistake by a network engineer who had accidentally connected a test network to the company’s production network. This mistake shut down their DNS system, which in turn knocked all of their services and websites offline.
“DNS is one of those things that gets overlooked… You make your voice servers super-redundant, but you take it for granted that DNS will always work.”
The sad reality is that often times things have to go horribly wrong for one person, in order for other people to learn how to prevent these issues from happening to them. Fonality’s commitment to transparency will hopefully save many businesses from suffering the same fate. Here are just a few of the key lessons Fonality had to learn the hard way (but are easy fixes if you are proactive).
Lesson #1: Backup Your DNS
While there is no way to 100% be assured that your DNS will always be available, you can never have too many backup systems in place. Secondary DNS is easy to set up and requires little to no maintenance. Simply signup for services with another DNS provider (who assumable has Secondary DNS services) and configure your records to be delegated to your secondary provider in the event your primary provider goes down. This is extremely useful for clients who host their own DNS services and are reluctant to move over to a cloud-based DNS provider. This way you can keep your primary systems where you want, but just in case, you have a backup that is able to handle your traffic load.
“DNS is too important. We cannot let it go down in the future. We already had a backup, but that didn’t help in this case. So we need a backup for the backup. We are putting in place an offsite system with DNS and name servers at a different location.” – Valentine
Lesson #2: Keep Your Status Site Separate
It’s common sense, you don’t use your corporate website to host your status page. But what most people forget is you should also keep your DNS records with on separate providers. Fonality learned the hard way that even though they were hosting their status page on an entirely different site, they were using a subdomain “trust.fonality.com” who’s records were dependent on their DNS system.
Even if your status site is on a separate subdomain, web host, etc… it will still be unavailable if your DNS goes down. That’s because DNS is the first point of contact between a user and your website, and if it is unavailable, so is everything else. This is a simple mistake even larger companies still forget, as seen with Fonality.
Lesson #3: Connect with Clients
The first that usually happens where there is a outage is your social media effectively “blows up”. It is a true test of patience to be able to go through and talk to each client individually and address the complaints, especially when they are mad. However, Fonality did an A+ job at reaching out to clients via social media channels and addressing client concerns one by one during the days following the outage.
Outages tend to test client trust, however this can be mitigated with transparency and continued support for affected clients. Some companies use SLA’s, which give clients the amount of lost services (sometimes more) back to them. Others use discounts or waive fees to regain trust. However, the best way to reestablish an healthy relationship with client’s is maintaining a connection. Sometimes it’s hearing the voice of an executive reaching out and offering their apology. In Fonality’s case, their CMO’s statement was one of the best reaction’s we have seen to an outage, and should be a standard for all service providers.