ch1-lagoon - Unavailability on certain sites.
Incident Report for amazee.io
Postmortem

In the process of adding another node to the Amazee CH1 OpenShift cluster an outdated X.509 certificate was used by mistake (a node with the same name was part of the cluster for a few weeks in 2018). Shortly after adding the node to the cluster, monitoring reported the wrong certificate. During the recovery effort one other compute node was impacted and all pods scheduled on that node were forcefully terminated, leading to brief application outage of websites running on that node. Redeploying the certificates in question on all nodes resolved the issue.

To avoid such issues from recurring in the future we will ensure not to reuse hostnames.

Posted 21 days ago. Apr 03, 2019 - 17:09 CEST

Resolved
This incident has been resolved.
Posted 22 days ago. Apr 03, 2019 - 14:11 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 22 days ago. Apr 03, 2019 - 13:46 CEST
Identified
The issue has been identified and a fix is being implemented.
Posted 22 days ago. Apr 03, 2019 - 13:27 CEST
Investigating
We are currently investigating this issue.
Posted 22 days ago. Apr 03, 2019 - 13:22 CEST
This incident affected: Switzerland (ch1.lagoon).