Generalized Instability

Incident Report for amazee.io

Postmortem

During investigation of an issue with the Lagoon logging systems, we discovered 2 compute nodes of the cluster causing instability. We rebooted each of those nodes in turn, causing a small amount of downtime as the affected pods were rescheduled to other nodes. After further investigation on our end, it was determined that we needed to restart the openshift controller as well, which was the ultimate resolution to these instability issues.

Posted Aug 02, 2019 - 19:57 UTC

Resolved

This incident has been resolved.

Posted Aug 02, 2019 - 06:27 UTC

Identified

We have identified the two nodes that are causing these issues are are working to restore functionality in this region.

Posted Aug 01, 2019 - 23:14 UTC

Investigating

Some pods are experiencing issues. We are investigating this issue now.

Posted Aug 01, 2019 - 22:51 UTC