A massive outage swept the internet this morning (June 8, 2021) taking down sites like Hulu, CNN, Twitch, Reddit, Spotify, and Vimeo, among many others. Cloud content delivery network (CDN) provider, Fastly, started experiencing connectivity issues at 4:58 AM CST which left customers of these high-traffic websites in the dark.
Reports of the outage started coming in concurrently with Fastly’s initial status update that let customers know they were investigating performance issues. Downdetector shows the outage spike peaking at 5:33 AM CST.
Is Fastly Down?
In less than an hour after Fastly announced potential impacts to performance with its CDN services, they reported to have resolved the issues and implemented a fix. The last two status posts regarding the global CDN disruption state the issue was determined and fixed and recovery of services were monitored:
Resolved – Fastly has observed recovery of all services and has resolved this incident. Customers could continue to experience a period of increased origin load and lower Cache Hit Ratio (CHR). -6:41 AM CST
Update – The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return. -5:57 AM CST
What is CDN?
CDN is a wide network of web servers that are configured to deliver content to clients from the geographically closest server to that client, thereby enhancing the transfer speed. CDNs are optimized to also favor the server with the fastest connection when automatically routing traffic to the nearest point of presence (POP). Essentially, this service distributes website content to several peering locations around the world to provide customers with fast service that is available to them at any time of the day or night.
Multi-CDN: The Solution for CDN Outages
As witnessed today, CDN providers are not immune to outages. Domain administrators should be implementing DNS strategies that will help avoid any type of end-user disruption. DNS strategies that involve proactive monitoring will immediately notice any CDN issue and move the affected traffic to another resource (CDN) that is functioning properly. Websites should be available to visitors free of interruptions and outages.
A Multi-CDN solution allows the bundling of two or more CDNs to optimize the speed of content delivery and assists in avoiding latency and outage issues. Not only can CDN service providers have outages, but their performance can vary depending on the region in which the end user is located. Constellix’s Multi-CDN solution includes a fully automated monitoring system that displays multiple CDN providers’ performances in real-time. Updated every second, the real-time analysis displays any detected anomalies so that smart DNS traffic-routing decisions can be made. In the event one of your CDN connections slows down or becomes unavailable, the Multi-CDN tool will dynamically reroute new queries to the healthiest and fastest CDN for the querying client. This solution can also be configured with other services, such as Traffic Steering and Failover for a more robust DNS strategy.
The screenshot below depicts what Constellix’s control panel looked like for customers using Multi-CDN services. The active CDN providers that the user has configured to the account are displayed, along with the status of each.
As you can see the Fastly CDN was removed instantly from this CDN configuration so that the end users were automatically routed “around” the Fastly outage.
Crucial Services Can Also Have Outages
This morning’s widespread outage should serve as a clear indicator that any mission-critical service can have an outage. There is not one that is immune to the vulnerabilities of misconfigurations (internal or external), distributed denial-of-service (DDoS) attacks, or vendor outages. Putting hard-earned trust into a single provider is not the best way to provide continuous, optimal service to end users.
Pinpointing and understanding where single points of failure occur will allow for a proper disaster recovery plan to be put in place in order to have a proper solution to outages. CTOs should require all IT leads and departments to do a full IT stack overview semi-annually to examine any new potential single points of failure. Critical services that were not initially single points of failure in previous months could have resulted in one now, due to a recent “move to the cloud” or cost-saving project.
Steven Job, President of DNS Made Easy and Constellix stated:
“Today’s internet outage shows how many large corporations still rely on single providers for many crucial services. Many IT organizations fail to understand what true redundancy is and what services they rely on. Providing intelligent traffic decisions with consistent monitoring is the only way to achieve the necessary uptime. This is what Constellix was designed for and performed excellently with today’s Fastly global outage.”
Once Fastly services were restored, the immediate performance of their network was also degraded which could lead to further performance issues for organizations. A properly configured Multi-CDN traffic-monitoring solution will use the RUM data received by their users to route traffic to the best-performing CDN.
Implementing Multi-CDN is one proactive decision that will eliminate the need for reactive ones later.