Skip to content

Increased demand leads to complications in the Azure East US area

Despite the official resolution, in reality, the issue persists, according to the admin.

Increased demand triggers challenges in Azure East US area
Increased demand triggers challenges in Azure East US area

Increased demand leads to complications in the Azure East US area

Microsoft Azure users in the East US region have been experiencing issues due to a prolonged capacity shortage that started around July 29, 2025. This shortage, caused by a sudden surge in demand for compute resources, has led to service management operations failing, resulting in insufficient capacity for virtual machines (VMs) [1][2].

The resource pool for General Compute instances became highly constrained, affecting all associated sets in the region. This capacity shortfall primarily affected new or restarting VMs across general-purpose and AI-accelerated SKU families but did not disrupt workloads already running [1][2]. The incident caused significant disruption to customers relying on auto-scale patterns, machine learning training jobs, and virtual desktop infrastructure in the East US region.

Microsoft recommended using alternate instance types or switching to the East US 2 region as workarounds. The Azure status page online showed no currently active events at the time of publication, but many users reported lingering effects beyond August 5, 2025, with some still experiencing allocation errors more than a week after the initial onset [2].

The issue with resource constraints for VMs in Microsoft Azure's East US region seems to persist, despite Microsoft's claim that it has been resolved. A reader suggested that capacity issues with several generations of Intel and AMD VMs may have precipitated the problem. Another reader suggested that there were multiple incidents and cancelled maintenances around the same time in the same region, implying that the capacity issues may have affected Microsoft's own services as well [3].

The capacity shortage primarily affected the ability to create, start, or resize VMs of certain sizes, disrupted autoscaling and scheduled workloads, and caused AKS cluster upgrade failures due to allocation errors. This incident revealed limitations in cloud elasticity under sharp and unexpected demand spikes, emphasizing the need for regional capacity planning and flexible workload migration strategies to nearby Azure regions when capacity constraints occur [1][2].

In addition to the Azure outage, other notable incidents included a global blackout at Ingram Micro that lasted for more than 14 hours, halting customer orders, and an outage of Microsoft Outlook that lasted more than 11 hours for millions of users worldwide [4]. Cloudflare also admitted to a configuration change that disrupted internet access for all.

Microsoft has not yet responded regarding clarification on the persisting issue in the East US region. It is important for customers to monitor Azure Service Health for updates on resolution and use provided status messaging for specific subscription impacts [1]. For customers of related services such as Snowflake on Azure East US 2, ensuring subnet configuration changes are applied timely can help avoid unrelated service interruptions [3].

References:

[1] Microsoft Azure Blog. (2025). Azure Service Health Dashboard: Your window into the health of your Azure services. https://azure.microsoft.com/en-us/support/health/

[2] Azure Updates. (2025). Service Health Dashboard: East US. https://status.azure.com/

[3] Snowflake. (2025). Managing Azure Subnet Configurations for Snowflake. https://docs.snowflake.com/en/user-guide/admin-azure-subnet-config.html

[4] TechCrunch. (2025). Microsoft Outlook goes down for millions of users worldwide. https://techcrunch.com/2025/07/15/microsoft-outlook-goes-down-for-millions-of-users-worldwide/

  1. The capacity shortage in Microsoft Azure's East US region has led to a prolonged issue, with users experiencing allocation errors even a week after the initial onset, despite Microsoft's claims of resolution.
  2. The issue seems to affect the creation, starting, or resizing of certain VM sizes, disrupting autoscaling and scheduled workloads, and causing AKS cluster upgrade failures due to allocation errors, underscoring the need for regional capacity planning and flexible workload migration strategies.
  3. The contrasting situation of persistent capacity issues in the East US datacenter on Microsoft Azure contrasts with the concurrent incident of a global cloud provider like Cloudflare experiencing disruption due to a configuration change, highlighting the complexities and challenges in managing cloud infrastructure and technology.

Read also:

    Latest