Emergency Maintenance ZH1-Compact
Incident Report for amazee.io
Resolved
Maintenance successfully completed after 2minutes 45 seconds.
Posted Jun 30, 2017 - 08:42 UTC
Monitoring
Maintenance starts
Posted Jun 30, 2017 - 08:30 UTC
Investigating
Our provider informed us that the hardware zh1-compact is running on has a failing memory bank which needs to be replaced.

We discussed several approaches and assessed the risk of those. One of the procedures would be to wait until the maintenance window of next Tuesday is due. As that involves a full weekend which the failing memory would be in use we reassessed other ways of migrating. Usually, we can live migrate without downtime. As this server uses the in-memory caching extensively the live migration needs to pause the system for a short period of time (2-4 minutes) in order to migrate the memory over to the new hardware.

As we have our engineers online and someone at the data center facilities which can handle the replacement of the failing memory we decided to perform the live migration during the day.

We will live migrate the system to new hardware at 1030 CEST and it will be supervised by our engineers. We expect a service interruption of a maximum of 2-4 minutes.

We will update this ticket as soon as we start the process of migrating.
Posted Jun 30, 2017 - 08:12 UTC