Widespread Website Outages – Fastly CDN

Update

Following the Fastly outage, Nick Rockwell, Fastly’s Senior VP of Engineering and Infrastructure, shared an update on their blog [1] identifying the root cause as an “undiscovered software bug” that was “triggered by a valid customer configuration change”. Whilst it is reassuring to know that no nefarious actor caused the outage, incidents such as this serve as an important reminder to ensure that QA processes are robust, especially when dealing with infrastructure that has such a wide-reaching impact.

Introduction

Commencing just before 1000hrs UTC on June 8, 2021, widespread reports of high-profile websites being unavailable began to surface with visitors to these sites receiving ominous looking error messages (Figure 1).

Widespread_Website_Outages_1

Figure 1 – Example outage message ‘gov.uk

Initially, many of these error messages returned a HTTP error ‘503’, advising the user that the service is unavailable, and these originated from a ‘Varnish cache server’ HTTP accelerator that was attempting to serve the intended content.

It is understood that these Varnish cache servers are utilized by the cloud services provider Fastly in their content delivery network (CDN), a fact subsequently proven by the Fastly ‘unknown domain’ errors, and therefore indicating that this major outage was caused by some issue in their CDN.

Testing access from multiple geographic locations confirmed that this issue was not specific to any one region or group of cache servers, consistent with the worldwide reports and the origin of many of the affected websites.

Whilst a reason for the outage was not initially shared, causing some to be concerned of foul-play, Fastly [2] subsequently identified and fixed a service configuration issue within an hour (Figure 2).

Widespread_Website_Outages_2

Figure 2 – Fastly’s initial explanation

Cyberint will continue to monitor the situation and, like many, keenly await the publication of a post-mortem report by Fastly.

Impact

Although many major websites are working around the issues arising from the Fastly CDN outage, some services may remain inaccessible or appear in some ‘degraded’ state.

For example, Amazon and Twitter both appeared to have image display and/or visual styles affected by the outage (Figure 3).

Widespread_Website_Outages_3

Figure 3 – Amazon.com homepage with broken image links and styles

Whilst not an exhaustive list, major websites suffering an outage due to this incident included:

  • Buzzfeed
  • CNN
  • NY Times
  • Etsy
  • Imgur
  • Kayak
  • Reddit
  • SF Chronicle
  • Shopify
  • Spectator UK
  • StackExchange
  • The Guardian
  • Twitch
  • UK Government
  • USA Today
  • Wapo

As of 1057hrs UTC, Fastly report that they have identified and fixed [3] the suspected configuration issue and their customers are advised that they may experience increased load as the services return (Figure 4).

Widespread_Website_Outages_4

Figure 4 – Fastly Network Status

Recommendations

  • In the absence of any malicious intent being identified, organizations are advised to monitor the situation and await a full post-mortem report by Fastly.
  • Organizations using Fastly CDN services should liaise with their network support teams to ensure that services are adequately restored.

References

[1] https://www.fastly.com/blog/summary-of-june-8-outage

[2] https://twitter.com/fastly/status/1402221348659814411?s=20

[3] https://status.fastly.com/

Uncover your compromised credentials from the deep and dark web

Fill in your business email to start