Unforeseen surge in demand leads to problems in Azure East US area
Microsoft Azure's East US Region Continues to Experience Resource Issues
Microsoft Azure's East US region encountered a resource issue starting on July 29, 2025, due to a sudden spike in demand for compute resources, primarily virtual machines, that exceeded the available capacity and pushed hardware beyond safe operational thresholds.
The surge in demand, particularly for General Compute virtual machine instances, led to allocation failures when creating or updating VMs, triggering errors like ZonalAllocationFailed. Despite Microsoft declaring the incident resolved by August 5, users report that the problem has persisted for over a week after the resolution claim[1].
Cause: - The sudden surge in demand for General Compute virtual machine instances in East US. - Insufficient capacity to handle the demand due to constrained resource pools and hardware limits being reached. - Specific instance types based on several generations of Intel and AMD CPUs were affected, complicating Kubernetes (AKS) cluster upgrades due to failed allocations[1].
Impact: - Customers faced failures creating or updating VMs, leading to operational disruptions. - Kubernetes clusters in AKS experienced upgrade failures with common errors related to resource allocations. - Microsoft recommended workarounds such as switching VM sizes or using the alternate East US 2 region[1]. - Related Microsoft services like Cloud PCs in East US also experienced connectivity issues around the same timeframe (reported August 11)[3].
Ongoing problems after claimed resolution: - Despite the official resolution, resource constraints reportedly continue to cause errors and failures for some users. - The lingering impact suggests underlying capacity or hardware challenges remain unresolved or partially mitigated[1][3].
The Azure status and updates pages show no new official updates about a continuing outage post-August 5, but user reports and administrative feedback indicate the issue's practical persistence beyond the resolution announcement[1][5].
Meanwhile, Microsoft's Outlook service experienced an outage earlier in July, affecting millions of users worldwide and lasting over 11 hours. Additionally, a global blackout at Ingram Micro lasted for more than 14 hours, halting customer orders.
A hint suggests that capacity issues with several generations of Intel and AMD VMs may have precipitated this problem. Microsoft recommended using alternate types or switching to the East US 2 region as workarounds.
There have been additional incidents and cancelled maintenances in the same region, suggesting that capacity issues may have affected Microsoft's own services as well. Another Azure outage affected users in Norway earlier in the year, impacting businesses and government websites delivering online services to citizens.
In the UK, a radar problem caused disruption in the skies on Wednesday, adding to the list of recent technological challenges.
In summary, the Azure East US resource problem was a capacity shortage triggered by a sudden demand spike, causing VM creation failures and related service impacts. Though Microsoft declared the issue resolved, ongoing reports point to incomplete recovery and persistent difficulties for particular workloads. Switching regions or VM types remains the practical interim workaround for affected users[1].
[1] - Source [3] - Source [5] - Source
- Beyond Microsoft's official resolution, persistent resource constraints in the cloud hardware are still allegedly causing errors and failures, indicating that underlying capacity or hardware challenges remain unresolved or partially mitigated.
- AI could potentially help forecast and manage the demand for virtual machines in Microsoft Azure's datacenters, reducing the likelihood of sudden spikes that exceed available capacity and causing resource issues like the one observed in the East US region.