AWS Host Level Degradation ...

Back to overview

AWS Host Level Degradation Causing Service Interruption

Feb 23, 2026 at 2:07pm UTC

Affected services

ManageMemberships.com

Door Access

Texts

Emails

Payments

Portals

Sub Systems

Time Clock

ManageRegister

ManageShifts

Resolved
Feb 23, 2026 at 2:07pm UTC

Instance became intermittently unresponsive. SSH access was timing out, EC2 Serial Console failed to connect, and stop operation was unusually slow. Status checks showed insufficient data during the event.

Initial investigation after regaining SSH showed no memory pressure, no swap usage, and no disk errors. CPU credit balance remained healthy at over 500 credits. However, vmstat revealed high CPU steal time, peaking above 50 percent, indicating significant hypervisor level contention.

No evidence of OOM killer events, filesystem corruption, or application level resource exhaustion was found. dmesg after reboot confirmed unclean shutdown but no underlying disk or kernel faults.

Resolution was achieved by stopping and starting the instance, forcing placement on new AWS hardware. After restart, CPU steal time returned to zero and system metrics normalized.

Root cause appears to be underlying AWS host level contention or degradation rather than application or configuration related.