Cloud-based solution provided by Oracle to bypass Windows boot problem without addressing the root cause directly
Oracle Cloud Infrastructure (OCI) is grappling with a persistent issue affecting Windows compute instances, where servers fail to boot after a restart, leaving them stuck on the loading screen. This problem, which emerged following Windows security patching, has been causing significant production outages for enterprise applications that use legacy VMs on OCI and run Windows[1].
Oracle has yet to provide a definitive fix for the issue, offering only workarounds such as performing a diagnostic reboot, rebuilding the instance, and restarting it. However, these workarounds have reportedly been unreliable in production environments, necessitating the restoration of instances from backups and their recreation to recover[1].
In an attempt to troubleshoot the issue, potential steps include using the Windows Recovery Console via OCI's instance console connection. This can involve repairing boot records using commands like , , and . Accessing this recovery mode requires rebooting the instance with Shift held and navigating through Troubleshoot > Advanced options > Startup Settings, then rebooting with appropriate options selected[2].
For more isolated boot volume problems, OCI supports detaching the boot volume from the affected instance (which must be stopped) and attaching it to another instance for diagnostics and repair[4].
If these steps fail to resolve the boot failure, reaching out to OCI Support and Microsoft Support is recommended for further assistance[2].
It's worth noting that Oracle initially suggested the issue may have been caused by changes on the user's side, but later acknowledged the problem as one of its known issues[1].
In summary, the current status is that the Windows boot failure issue on OCI is recognized but not fully resolved, with only partial workarounds available. The best options currently are manual recovery attempts via the console, repair of boot records, rebuilding instances, or restoring from backups as needed[1][2][4]. Systems administrators are advised to exercise caution when managing Windows servers on OCI due to this ongoing issue.
[1] - https://www.theregister.com/2022/03/03/oracle_windows_boot_failures/ [2] - https://docs.oracle.com/en-us/iaas/Content/Support/Reference/faqs/Instance-Boot-Failure.htm [4] - https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/managinginstances/detachingbootvolume.htm
- Enterprise applications that utilize legacy VMs on Oracle Cloud Infrastructure (OCI) for their Windows servers have been experiencing persistent issues with servers failing to boot after a restart, resulting in significant production outages.
- Oracle's proposed solutions, such as performing diagnostic reboots, rebuilding instances, and restarting them, have been reportedly unreliable in production environments, necessitating backup restoration and instance recreation.
- Troubleshooting potential steps involve using the Windows Recovery Console through OCI's instance console connection, employing commands like 'BootREC', 'fixboot', and 'fixmbr' to repair boot records.
- For more isolated boot volume problems, OCI enables detaching the problematic boot volume from the affected instance for diagnostics and repair, and then reattaching it.
- In cases where the boot failure remains unresolved, reaching out to OCI Support and Microsoft Support for further assistance is advised. Systems administrators should exercise caution managing Windows servers on OCI due to this ongoing issue.