Cloud Security , Critical Infrastructure Security , Endpoint Security
Microsoft 365 Cloud Service Outage Disrupts Users Worldwide'Network Change' Tied to Service Disruption Now Rolled Back, Tech Giant Reports
Microsoft blamed an internal network configuration change for a series of outages that disrupted access globally to its Azure cloud services, including Outlook and Microsoft Teams.
See Also: Webinar | How the SASE Architecture Enables Remote Work
"Any user serviced by the affected infrastructure may be unable to access multiple Microsoft 365 services," Microsoft warned Wednesday morning in a Microsoft 365 service degradation alert.
The outage affected a wide range of services, including SharePoint Online and OneDrive for Business. Other affected services included Microsoft 365 admin portal and Microsoft Intune endpoint management, as well as Microsoft Defender for Cloud Apps, Identity and Endpoint. Microsoft said the problem didn't just affect direct access to its services but also how information was flowing between its data centers.
The Downdetector website, which crowdsources reports of service outages, said reports of disruptions began to surge around 7:00 a.m. UTC. Also affected were consumer-focused services such as Minecraft and Xbox Live.
Microsoft, around 08:00 a.m. UTC, first confirmed outage reports and said it appeared to have identified the problem, which it described as "a wide-area networking routing change." That description suggests that the networking change triggered domain name system disruptions.
Concurrently, users across the world reported service disruption. "We are seeing dropped packets and high latency to resources in Azure at the moment," one U.K.-based systems engineer posted to Mastodon.
We're investigating issues impacting multiple Microsoft 365 services. More info can be found in the admin center under MO502273.— Microsoft 365 Status (@MSFT365Status) January 25, 2023
Microsoft at about 09:00 a.m. UTC issued this update: "We've rolled back a network change that we believe is causing impact. We're monitoring the service as the rollback takes effect."
Following the change, Microsoft said more users had regained access to their Microsoft 365 services. "We're also connecting the service to additional infrastructure to expedite the recovery process," Microsoft tweeted.
Microsoft's Azure status page on Wednesday morning displayed outages across Africa, the Americas, Asia-Pacific, Europe and the Middle East. The only unaffected region was China, including Azure Government for China. Microsoft's cloud-based services are used by many of the world's largest companies, meaning the outage likely affected millions of individuals.
Many users took to Twitter to report outages, in some cases via the #Outlookdown hashtag.
"I guess Teams and Outlook going down means I can finally have a break from people contacting me," one user tweeted. "But no seriously, Teams and Outlook not working is really disrupting my day."
Microsoft last experienced a major outage in June 2022, when access to Microsoft 365 was disrupted. So too was access to its Azure cloud computing environment, which is second only to Amazon Web Services in terms of the size of its user base.
Other large providers have not been immune to major disruptions. Also in June 2022, Google reported that a serious outage, which disrupted access for users in the Middle East and led to an increase in latency for users in Europe and Asia, traced to a physical infrastructure problem.
"The outage was triggered by two simultaneous fiber cuts within our Middle East network. This affected the end-to-end path for several submarine cables, reducing capacity for many telecom and technology companies, including Google," it said at the time.
In June 2021, Facebook reported being "unreachable" for many users for nearly three hours, saying it was "the worst outage we've had in over four years."
Facebook's engineering team said the problem that caused the service disruption involved a flaw in its automated system for verifying database integrity. Even when fixed, the flaw had created a "feedback loop" that continued to disrupt its databases. To fix the problem, Facebook had to turn its automated database integrity verification service off and back on again.